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PREFACE 


These unforeseen stoppages, 
which I own I had no conception of when I first set out; 
but which, I am convinced now, will rather increase than diminish as I advance, 

— have struck out a hint which I am resolved to follow; 

— and that is, — not to be in a hurry; 
but to go on leisurely, writing and publishing two volumes of my life every year; 
— which, if I am suffered to go on quietly, and can make a tolerable bargain 

with my bookseller, I shall continue to do as long as I live. 

— LAURENCE STERNE, The Life and Opinions of 

Tristram Shandy, Gentleman (1760) 


This booklet contains draft material that I’m circulating to experts in the 
field, in hopes that they can help remove its most egregious errors before too 
many other people see it. I am also, however. posting it on the Internet for 
courageous and/or random readers who don’t mind the risk of reading a few 
pages that have not yet reached a very mature state. Beware: This material has 
not yet been proofread as thoroughly as the manuscripts of Volumes 1 ， 2, and 3 
were at the time of their first printings. And those carefully-checked volumes, 
alas, were subsequently found to contain thousands of mistakes. 

Given this caveat，I hope that my errors this time will not be so numerous 
and / or obtrusive that you will be discouraged from reading the material carefully. 
I did try to make the text both interesting and authoritative, as far as it goes. 
But the field is vast; I cannot hope to have surrounded it enough to corral it 
completely. So I beg you to let me know about any deficiencies that you discover. 

To put the material in context, this pre-fascicle contains Section 7.1.3 of a 
long，long chapter on combinatorial algorithms. Chapter 7 will eventually fill 
at least three volumes (namely Volumes 4A ， 4B，and 4C). assuming that I’m 
able to remain healthy. It will begin with a short review of graph theory，with 
emphasis on some highlights of significant graphs in the Stanford GraphBase ， 
from which I will be drawing many examples. Then comes Section 7.1: Zeros 
and Ones, beginning with basic material about Boolean operations in Section 
7.1.1 and Boolean evaluation in Section 7.1.2. Section 7.1.3. which you’re about 
to read here, applies these ideas to make computer programs run fast. Section 
7.1.4 will then discuss the representation of Boolean functions. 

The next part, 7.2, is about generating all possibilities, and it begins with 
Section 7.2.1: Generating Basic Combinatorial Patterns. Fascicles for this section 
have already appeared on the Web and/or in print. Section 7.2.2 will deal with 
backtracking in general. And so it will continue, if all goes well; an outline of 
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the entire Chapter 7 as currently envisaged appears on the taocp webpage that 
is cited on page ii. 

This part of The Art of Computer Programming has probably been more fun 
to write than any other so far. Indeed. I’ve spent more than 30 years collecting 
material for Section 7.1.3; finally I，m able to assemble these goodies together 
and segue through them. 

Most of Volume 4 will deal with abstract concepts，and there will be little 
or no need to say much about a computer’s machine language. Volumes 1—3 
have already dealt with most of the important ideas about programming at that 
level. But Section 7.1.3 is a notable exception: Here we often want to see the 
very pulse of the machine. 

Therefore I strongly recommend that readers become familiar with the ba¬ 
sics of the MMIX computer, explained in Volume 1 Fascicle 1， in order to fully 
appreciate the bitwise tricks and techniques described here. Cross-references 
to Sections 1.3.1" and 1.3.2" in the present booklet refer to that fascicle. I’ve 
reprinted the basic MMIX opcode - and-timing chart，Table 1.3.1'— 1， at the end of 
this booklet for convenience, together with a list of ASCII codes. 

The topic of Boolean functions and bit manipulation can of course be inter¬ 
preted so broadly that it encompasses the entire subject of computer program¬ 
ming. The real goal of this fascicle is to focus on concepts that appear at the 
lowest levels，concepts on which we can erect significant superstructures. And 
even these apparently lowly notions turn out to be surprisingly rich, with explicit 

ties to sections 1.2.4, 1.2.5, 1.2.8. 2.3.1, 2.3.3, 2.3.4.2, 2.3.5, 3.1, 3.2.2, 4.1, 4.4, 
4.5.3, 4.5.4, 4.6.1 ， 4.6.2, 4.6.3, 4.6.4, 5, 5.2.2, 5.2.3, 5.2.5, and 5.3.4 of the first 

three volumes. I strongly believe in building up a firm foundation，so I have 
discussed Boolean topics much more thoroughly than I will be able to do with 
material that is newer or less basic. Section 7.1.3 presented me with an extreme 
embarrassment of riches: After typing the manuscript I was astonished to dis¬ 
cover that I had come up with 211 exercises，even though — believe it or not — I 
had to eliminate quite a lot of the interesting material that appears in my files. 

My notes on combinatorial algorithms have been accumulating for more 
than forty years，so I fear that in several respects my knowledge is woefully 
behind the times. Please look, for example, at the exercises that I’ve classed as 
research problems (rated with difficulty level 46 or higher)，namely exercises 61. 

76, 112, 117, 126, 128, 129, 130, and 174; I’ve also implicitly mentioned or posed 
additional unsolved questions in the answers to exercises 21, 140 ， 141, 156, and 
165. Are those problems still open? Please inform me if you know of a solution 
to any of these intriguing questions. And of course if no solution is known today 
but you do make progress on any of them in the future. I hope you’ll let me know. 

I urgently need your help also with respect to some exercises that I made up 
as I was preparing this material. I certainly don’t like to receive credit for things 
that have already been published by others, and most of these results are quite 
natural “fruits” that were just waiting to be “plucked.” Therefore please tell 
me if you know who deserves to be credited，with respect to the ideas found in 

exercises 5, 6, 20, 26, 34, 39, 49, 50, 53, 57, 58(d ， e) ， 59, 60, 72, 78, 80, 81 ， 82, 83, 
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84, 86, 90, 95, 110, 115, 116, 120, 121 ， 127, 146, 154, 155, 159, 168, 183, 193, and 
198, and / or the answers to exercises 17 ， 18, and 139. Furthermore I’ve credited 
exercises 45 and 54 to unpublished work of Tom Rokicki and Bill Gosper. Have 
either of those results ever appeared in print, to your knowledge? 

I shall happily pay a finder’s fee of $2.56 for each error in this draft when it is 
first reported to me, whether that error be typographical, technical, or historical. 
The same reward holds for items that I forgot to put in the index. And valuable 
suggestions for improvements to the text are worth 32^ each. (Furthermore, if 
you find a better solution to an exercise, I’ll actually reward you with immortal 
glory instead of mere money，by publishing your name in the eventual book:—) 
Cross references to yet-unwritten material sometimes appear as c 00 ’； this 
impossible value is a placeholder for the actual numbers to be supplied later. 

Happy reading! 

Stanford. California D. E. K. 

16 December 2006 
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notation X — y 
monus function 
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saturated subtraction 
Hexadecimal constants 
Notation 


[These techniques] are instances of general mathematical principles 
waiting to be discovered, if an appropriate setting is created. 

Such a setting would be a calculus of bitmap operations, so one can learn 
to use these operations just as naturally as arithmetic operations on numbers. 

— L . 丄 GUIBAS and 丄 STOLFI ，ACM Transactions on Graphics (1982) 

A nice mixture of boolean and numeric functions — 

a suitable exercise for biturgical acolytes. 

— R. W. GOSPER (1996) 

A note on notation. Several formulas in Section 7.1.3 use the notation {xyz), 
for the median function (aka majority function) that is discussed extensively in 
Section 7.1.1. Other formulas use the notation x — for the monus function 
(aka dot-minus or saturated subtraction), which was defined in Section 
Hexadecimal constants are preceded by a sharp sign: # 123 means (123)16. If you 
run across other notations that may be unfamiliar, please look at the Index to 
Notations at the end of Volumes 1 ， 2, or 3, and / or the entries under “Notation” 
in the index to the present booklet. Of course Volume 4 will some day contain 
its own Index to Notations. 
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Braymore,Caroline 
Rochdale,Simon 
COLMAN 
bitwise— 


Lady Caroline. Psha! that’s such a hack! 

Sir Simon. >4 hack, Lady Caroline, that 

the knowing ones have warranted sound. 

— GEORGE COLMAN, John Bull, Act 3, Scene 1 (1803) 

7.1.3. Bitwise Tricks and Techniques 

Now comes the fun part: We get to use Boolean operations in our programs. 

People are more familiar with arithmetic operations like addition, subtrac¬ 
tion, and multiplication than they are with bitwise operations such as “and,” 
“exclusive-or,” and so on，because arithmetic has a very long history. But we will 
see that Boolean operations on binary numbers deserve to be much better known. 
Indeed, they’re an important component of every good programmer’s toolkit. 

Early machine designers provided fullword bitwise operations in their com¬ 
puters primarily because such instructions could be included in a machine’s 
repertoire almost for free. Binary logic seemed to be potentially useful, although 
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x Sz y = z Xk A yk = Zk : for all A: > 0; 

x \ y = z V yk = for all A: > 0; 

x ㊉ y = z <^=4> Xk ® yk = Zk, for all A: > 0. 


(It would be tempting to write c xAy : instead of xSzy, and c xWy : instead ofx\y; but 
when we study optimization problems we’ll find it better to reserve the notations 
x A y and x\/ y for min(x, y) and max(x, y), respectively.) Thus, for example, 

5 & 11 = 1， 5 I 11 = 15， and 5 0 11 = 14， 

since 5 = (0101) 2 , 11 = (1011) 2 , 1 = (0001) 2 , 15 = (1111)2, and 14 = (1110) 2 . 

Negative integers are to be thought of in this connection as infinite-precision 
numbers in two’s complement notation, having infinitely many Is at the left; for 
example，一5 is (… 1111011)2. Such infinite-precision numbers are a special case 
of 2-adic integers, which are discussed in exercise 4.1—31，and in fact the operators 
&， I, ㊉ make perfect sense when they are applied to arbitrary 2-adic numbers. 

Mathematicians have never paid much attention to the properties of & and | 
as operations on integers. But the third operation， ㊉， has a venerable history, 
because it describes a winning strategy in the game of nim (see exercises 8-16). 
For this reason x^y has often been called the “nim sum” of the integers x and y. 


only a few applications were originally foreseen. For example, the EDSAC com¬ 
puter. completed in 1949， included a “collate” command that essentially per¬ 
formed the operation z -f- z+(x&y)，where z was the accumulator, x was the mul¬ 
tiplier register，and y was a specified word in memory; it was used for unpacking 
data. The Manchester Mark I computer, built at about the same time, included 
not only bitwise AND. but also OR and XOR. When Alan Turing wrote its first 
programming manual in 1950， he remarked that bitwise NOT can be obtained 
by using XOR (denoted ‘丰 ’）in combination with a row of Is. R. A. Brooker， 
who extended Turing’s manual in 1952 when the Mark II computer was being 
designed，remarked further that OR could be used “to round off a number by 
forcing 1 into its least significant digit position •” By this time the Mark II， which 
was to become the prototype of the Ferranti Mercury, had also acquired new 
instructions for sideways addition and for the position of the most significant 1. 

Keith Tocher published an unusual application of AND and OR in 1954. 
which has subsequently been reinvented frequently (see exercise 85). And dur¬ 
ing the ensuing decades, programmers have gradually discovered that bitwise 
operations can be amazingly useful. Many of these tricks have remained part of 
the folklore; the time is now ripe to take advantage of what has been learned. 

A trick is a clever idea that can be used once，while a technique is a trick 
that can be used at least twice. We will see in this section that tricks tend to 
evolve naturally into techniques. 


EDSAC computer 
collation, see bitwise and 
unpacking 

Manchester Mark I computer 

AND 

OR 

XOR 


NOT 

Brooker 

Mark II computer (Manchester/Ferranti) 

round off 

Ferranti Mercury 

sideways addition 

most significant 1 

Tocher 

tricks versus techniques 
infinite-precision numbers 
two’s complement notation 
2-adic integers 
nim 

nim sum 


Enriched arithmetic. Let’s begin by officially defining bitwise operations on 

integers so that，if x = (• • • 尤 2 尤 1 尤 0 ) 2 ， V = (• • • 2 / 2 2/i 2 / 0 ) 2 ，and z = (• • • 2 : 22 : 1 ^ 0)2 
in binary notation, we have 


\ / \ / \^ / 

12 3 

/#(\ /IV /(\ 
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All three of the basic bitwise operations turn out to have many useful 
properties. For example, every relation between 八， V， and ㊉ that we studied in 
Section 7.1.1 is automatically inherited by &， | ， and ㊉ on integers, since the rela¬ 
tion holds in every bit position. We might as well recap the main identities here: 

x h y = y h X ， x \ y = y \ x ㊉ y = y ㊉: r; ( 4 ) 

= xk(ykz), (x I y) \ z = x\(y\z), (x©")©z = x ㊉ (^/©z) ; ( 5 ) 


(x\y)Szz = 

(xkz) 1 (ykz), (xk,y)\z = (x\z)k,(y\ z); 

⑹ 

(x ㊉ y) & z = 

:(z&z) © (y & z); 

(7) 

(x & y) x = 

(x \ y) x = X ； 

⑻ 

(X & y) © (x 1 々 ）=X ㊉ y; 

(9) 

x & 0 = 0, 

x 0 = x, x ㊉ 0 = x; 

(10) 

X X = 

x x = : r ㊉ x = 

(ii) 

x Sz —1 = x : 

x _1 = —1, x ㊉ 一 l = x; 

(12) 

x Sz x = 0, 

x x = —1, : r ㊉ 无 = _1; 

( 工 3) 

x y = x \ x y = x x ㊉ y = 无 ㊉ y = x ㊉ 歹 • 

(M) 


The notation x in ( 12 ). ( 13 )， and ( 14 ) stands for bitwise complementation of 
namely (…• 无 2 无 1^0)2，also written 〜 x. Notice that ( 12 ) and ( 13 ) aren’t quite 
the same as 7.1.1-(io) and 7.1.1—( 18 ); we must now use —1 =( … 1111)2 instead 
of 1 = (… 0001)2 in order to make the formulas bitwise correct. 

We say that x is contained in y, written x C y or y D if the individual 
bits of x and y satisfy Xk < Vk for all A: > 0. Thus 

x C y <^=4> x Szy = x ^ \ y = y x Sz y = 0. ( 15 ) 

Of course we needn’t use bitwise operations only in connection with each 
other; we can combine them with all the ordinary operations of arithmetic. For 
example，from the relation x -\-x = … 1111)2 = —1 we can deduce the formula 

—x = X + 1 , ( 16 ) 

which turns out to be extremely important. Replacing x by x — 1 gives also 

—x = x — 1 : ( 17 ) 

and in general we can reduce subtraction to complementation and addition: 

x — y = x y. ( 18 ) 


We often want to shift binary numbers to the left or right. These operations 
are equivalent to multiplication and division by powers of 2 . but it is convenient 
to have special notations for them: 

x 《 k = x shifted left k bits = [2 k x\ ; ( 19 ) 

x 》 k = x shifted right k bits = : r」. ( 20 ) 

Here k can be any integer，possibly negative. In particular we have 

x <C (—k) = x 》 k and a : 》 (—k) = x 《 k, ( 21 ) 


commutative laws 
associative laws 
distributive laws 
absorption laws 
complementation 
notation : 〜 X 
negation 
subtraction 
addition 
shift binary 
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for every infinite-precision number x. Also (x & y) 《 k = (x 《 k) & (y 《 k) ， etc. 

When bitwise operations are combined with addition, subtraction, multi¬ 
plication, and / or shifting，extremely intricate results can arise，even when the 
formulas are quite short. A taste of the possibilities can be seen，for example ， 
in Fig. 7. Furthermore，such formulas do not merely produce purposeless, 
chaotic behavior: A famous chain of operations known as “Gosper’s hack,” first 
published in 1972, opened people’s eyes to the fact that a large number of useful 
and nontrivial functions can be computed rapidly (see exercise 20). Our goal in 
this section is to explore how such efficient constructions might be discovered. 



Fig. 11. A small portion of 
the patchwork quilt defined by 
the bitwise function f(x. y)= 

((x ㊉ 5) & ((y - 350) 》 3)) 2 ; 

the square cell in column x 
and row y is painted white or 
black according as the value of 
((/(x, y) 》 12) & l) is 0 or 1. 
(Design by D. Sleator, 1976; 
see also exercise 18.) 


infinite-precision 

Sleator 

quilt 

pixel pattern 

black 

white 

Gosper’s hack 

packing+ + 

unpackingH--1- 
Lehmer 

fractional precision 

date 

mod 

division 


Packing and unpacking. We studied algorithms for multiple-precision arith¬ 
metic in Section 4.3.1. dealing with situations where integers are too large to fit in 
a single word of memory or a single computer register. But the opposite situation, 
when integers are significantly smaller than the capacity of one computer word, is 
actually much more common; D. H. Lehmer called this “fractional precision •” We 
can often deal with several integers at once，by packing them into a single word. 

For example, a date x that consists of a year number y，a month number m ， 
and a day number can be represented by using 4 bits for m and 5 bits for d: 


x = (((y 《 4) + m) 《 5) + c?. ( 22 ) 

We’ll see below that many operations can be performed directly on dates in this 
packed form. For example 5 x < x r when date x precedes date x f . But if necessary 
the individual components {y. m. d) can readily be unpacked when x is given: 

d = x mod 32, m = (rr 》 5) mod 16 ， y = x 9. ( 23 ) 

And these “mod” operations do not require division，because of the important 
law 


x mod 2 n = x & ( 2 n — 1 ). ( 24 ) 

For example，we have c? = x & 31 in ( 22 ) and ( 23 ). 

Such packing of data obviously saves space in memory, and it also saves time: 
We can more quickly move or copy items of data from one place to another when 
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they’ve been packed together. Moreover, modern computers run considerably 
faster when they operate on numbers that fit into a cache memory of limited size. 

The ultimate packing density is achieved when we have 1-bit items, because 
we can then cram 64 of them into a single 64-bit word. Suppose，for example ， 
that we want a table of all odd prime numbers less than 1024， so that we can 
easily decide the primality of a small integer. No problem; only eight 64-bit 
numbers are required: 

P Q = 0111011011010011001011010010011001011001010010001011011010000001, 

Pi = 010011000011001001010010011000011011000001000001011010011 ⑻ 00100, 

P 2 = 1001001100101100001000000101101000000100100001101001000100100101 , 
P 3 = 0010001010001000011000011001010010001011010000010001010001010010 , 
P 4 = 000011000000001001000010010011001000010010011001001011000001 ⑻ 00, 
P 5 = 11010010011000001010010001000010001 ⑻ 001000100100101000100101000, 
P 6 = 1010 ⑻ 000100001000001100001101100001 ⑻ 0000101101000000101101 ⑻ 00, 
P 7 = 0000010100010000100010100100100000010100100100010010 ⑻ 0010100110 . 

To test whether 2A: + 1 is prime, for 0 < A; < 512, we simply compute 

^lk/64 ] 《 0 & 63) ( 25 ) 

in a 64-bit register, and see if the leftmost bit is 1. For example, the following 
MMIX instructions will do the job, if register pbase holds the address of Pq: 

SRU $0,k,3 $0 l |>/ 8 」（ Le .，&》 3 ). 

LD0U $ 1, pbase , $0 $1 i — P ^$o/sj - 

AND $0 ， k ， #3f $0 ^ A: mod 64 (i.e., k & # 3f). ( 26 ) 

SLU $1,$1 ,$0 $1 ($1<$0) mod 2 64 . 

BN $1,PRIME Branch to PRIME if s($l) < 0. | 

Notice that the leftmost bit of a register is 1 if and only if the register contents 
are negative. 

We could equally well pack the bits from right to left in each word: 

Q 0 = 1000000101101101000100101001101001100100101101001100101101101110, 

g；L = 0010000110010110100000100000110110000110010010100100110000110010, 

q 2 = 1010010010001001011000010010000001011010 ⑻ 0001000011010011001001 , 

g 3 = 0100101000101000100000101101000100101001100001100001000101000100, 

d 4 = 000010000011010010011001001000010011001001000010010000000011 ⑻ 00 , 
= 0001010010001010010010001000010001000010001001010000011001001011 , 
d 6 = 0000101101000000101101000000100001101100001100000100001000000101 , 
(^7 = 0110010100000100100010010010100000010010010100010000100010100000 ; 

here Qj = Instead of shifting left as in ( 25 ), we now shift right, 

Q[fe/ 64 J 》（众 &63), ( 27 ) 

and look at the rightmost bit of the result. The last two lines of ( 26 ) become 


SRU $1,$1,$0 $1 — $1>$0. 

BOD $1 ， PRIME Branch to PRIME if $1 is odd. | 


(And of course we use qbase instead of pbase.) Either way, the classic sieve of 
Eratosthenes will readily set up the basic table entries Pj or Qj (see exercise 24). 


cache memory 
prime numbers 
table lookup by shifting 
sieve of Eratosthenes 
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Table 1 

THE BIG-ENDIAN VIEW OF A 32-BYTE MEMORY 


octa 0 


tetra 0 


tetra 4 


wyde 0 


wyde 2 


wyde 4 


wyde 6 


byte 0 


ao ... a7 


byte 1 


byte 2 


byte 3 


byte 4 


byte 5 


byte 6 


byte 7 


as … ai5 ai6 … a23 ^24 - - - ^31 CL32 - - - ^39 ^40 - - • ^47 a48 … ^55 a56 …^63 


octa 8 


tetra 8 


tetra 12 


wyde 8 


wyde 10 


wyde 12 


wyde 14 


byte 8 


byte 9 


byte 10 byte 11 byte 12 byte 13 byte 14 byte 15 


(264 … CL71 a72 … ^79 aso … ^87 a88 … ^95 ^96 - - - ^103 ^104 - - - ^111 ^112 - - - ^119 ^120 - - - ^127 

octa 16 


tetra 16 


tetra 20 


wyde 16 


wyde 18 


wyde 20 


wyde 22 


byte 16 byte 17 byte 18 byte 19 byte 20 byte 21 byte 22 byte 23 

，一一 S ， 八 S /^ — —^ ■ — s 八 N ， 八 S ， 、，八 、，八 N 

ai28 … ^135 ai36 … ai43 ai44 … ai5i ai52 … ai59 ai60 … ai67 ai68 … ^175 ^176 - - - ^183 ^184 - - - ^191 

octa 24 


tetra 24 


tetra 28 


wyde 24 


wyde 26 


wyde 28 


wyde 30 


byte 24 byte 25 byte 26 byte 27 byte 28 byte 29 byte 30 byte 31 


ai92 … ai99 a200 … ^207 奶 08… ^215 ^216 • - • ^223 ^224 • • - ^231 ^232 - - - ^239 ^240 • • • 0/247 ^248 - - - ^255 


big-endian+ + 

little-endian+ + 

multiple-precision 


Big-endian and little-endian conventions. Whenever we pack bits or bytes 
into words，we must decide whether to place them from left to right or from right 
to left. The left-to-right convention is called “big-endian，” because the initial 
items go into the most significant positions; thus they will have bigger significance 
than their successors, when numbers are compared. The right-to-left convention 
is called “little-endian” ； it puts the first items where little numbers go. 

A big-endian approach seems more natural in many cases, because we’re ac¬ 
customed to reading and writing from left to right. But a little-endian placement 
has advantages too. For example，let’s consider the prime number problem again; 
let a/e = [2A:+1 is prime]. Our table entries {P。， Pi,, P7} are big - endian, and 
we can regard them as the representation of a single multiple-precision integer 
that is 512 bits long: 

(Po^Pi • •-^ 7 ) 2 ^ = (a 0 ai .. .a 5 n) 2 . ( 29 ) 

Similarly, our little-endian table entries represent the multiprecise integer 

(Q7〜 QiQo) 264 = (a 5n .. .aia 0 ) 2 - (3。) 


The latter integer is mathematically nicer than the former，because it is 


511 


511 


2 k ak = 2 k [2k-\-l is prime 



2 k [2k-\-l is prime]) mod2 512 . 
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Table 2 

THE LITTLE-ENDIAN VIEW OF A 32-BYTE MEMORY 


octa 24 


tetra 28 


tetra 24 


wyde 30 


wyde 28 


wyde 26 


wyde 24 


byte 31 byte 30 byte 29 byte 28 byte 27 byte 26 byte 25 


byte 24 


奶55 … ^248 奶47… ^240 ^239 • - - ^232 ^231 - - - ^224奶23 … ^216 ^215 - - - ^208 CL 207 … ^200奶99 … ^192 

octa 16 


tetra 20 


tetra 16 


wyde 22 


wyde 20 


wyde 18 


wyde 16 


byte 23 byte 22 byte 21 byte 20 byte 19 byte 18 byte 17 byte 16 

^ 、/ ^ 、/ 、/ 、/ 圆- 、/ - - ^ 、 

ai91 … ai84 ai83 … ^176 ^175 - - - ^168 ^167 • • - ^160 ^159 • - - ^152 ^151 - - - ^144 ^143 - - - ^136 ^135 - - - ^128 


octa 8 


tetra 12 


tetra 8 


wyde 14 


wyde 12 


wyde 10 


wyde 8 


byte 15 byte 14 byte 13 byte 12 byte 11 


byte 10 


byte 9 


byte 8 


ai27 … ^120 aii9 … an2 am ... ai04 ^103 - - - 例 5… ass as7 … ^80 ^79 - - - ^72 a7i … a64 


octa 0 


tetra 4 


tetra 0 


wyde 6 


wyde 4 


wyde 2 


wyde 0 


byte 7 


byte 6 


byte 5 


byte 4 


byte 3 


byte 2 


byte 1 


byte 0 


^63 - - . ^56 a 55 … a48 CL 47 … （240 ^39 • - • ^32 ^31 - • - ^24 ^23 - - - ^16 ^15 • • • 财 


... ao 


Notice, however, that we used (Q 7 . .. QiQo) 2 64 to get this simple result, not 
(Q 0 Q 1 … Q 7 ) 2 64 . The other number. 

(Q0Q1 • • • Q7)2 64 = ( a 63 - - - ^1^0^127 - - - ^65^64^191 - - - ^385^384^511 - - - “449^448)2 

is in fact quite weird, and it has no really nice formula. (See exercise 25.) 

Endianness has important consequences, because most computers allow in¬ 
dividual bytes of the memory to be addressed as well as register-sized units. MMIX 
has a big-endian architecture; therefore if register x contains the 64-bit number 
# 0123456789abcdef ， and if we use the commands c ST0U x，0; LDBU y，l’ to 
store x into octabyte location 0 and read back the byte in location 1 ， the result 
in register y will be # 23. On machines with a little-endian architecture, the 
analogous commands would set y 4 — # cd instead; #23 would be byte 6 . 

Tables 1 and 2 illustrate the competing “world views” of big-endian and 
little-endian aficionados. The big-endian approach is basically top-down, with 
bit 0 and byte 0 at the top left; the little - endian approach is basically bottom-up, 
with bit 0 and byte 0 at the bottom right. Because of this difference, great care 
is necessary when transmitting data from one kind of computer to another, or 
when writing programs that are supposed to give equivalent results in both cases. 
On the other hand，our example of the Q table for primes shows that we can 
perfectly well use a little-endian packing convention on a big-endian computer 


portability+ 



8 


COMBINATORIAL ALGORITHMS (F1A) 


7.1.3 


rightmost bitsH --\- 
smearing bits 
extracting bits 
removing bits 
runs of bits 
Wegner 
Gladwin 
Warren 
trailing zeros 
ruler function 

px 

binary valuation, see ruler function 

in other words, x consists of some arbitrary (but infinite) binary string followed 
by a 0 , which is followed by a + 1 ones，and followed by b zeros, for some a > 0 
and b > 0. (The exceptions are when x = —2 b and a = oo.) Consequently 


^ = (al0 a 01 6 ) 2 , (33) 

x-1 = (a01 a 01 6 ) 2 , ( 34 ) 

= (al0°10 6 ) 2 ; ( 35 ) 

and we see that x +1 = —x = x — 1 /m agreement with ( 16 ) and ( 17 ). With two 
operations we can therefore compute relatives of x in several useful ways: 

xk{x-l) = ( a 01 a 00 6 ) 2 [remove the rightmost 1 ]; ( 36 ) 

x & —x = ( 0 °° 00 a 10 6 )2 [extract the rightmost 1 ]; (37) 

x I (l°°ll a 10 6 ) 2 [smear the rightmost 1 to the left]; ( 38 ) 

: r ㊉ -x = (l°°ll a 00 6 ) 2 [remove and smear it to the left] : ( 39 ) 

X I ( X - 1 ) = ( 01 a ll 6 ) 2 [ smear the rightmost 1 to the right]; (40) 

x ㊉ (x— 1 ) = ( 0 °° 00 a ll fc )2 [extract and smear it to the right]; (41) 


x & (x— 1 ) = ( 0 °° 00 a 01 6 )2 [extract, remove, and smear it to the right]. ( 42 ) 

And two further operations produce yet another variant: 

((x\(x-l))-\-l)^x = ( a 00 a 00 6 ) 2 [remove the rightmost run of Is]. ( 43 ) 

When x = 0， half of these formulas produce 0， the other half give —1. [For¬ 
mula ( 36 ) is due to Peter Wegner, CACM 3 (1960), 322; and ( 43 ) is due to 
H. Tim Gladwin, CACM 14 (1971), 407-408. See also Henry S. Warren, Jr., 
CACM 20 (1977), 439-441. 

The quantity b in these formulas, which specifies the number of trailing zeros 
in x，is called the ruler function of x and written px, because it is related to 
the lengths of the tick marks that are often used to indicate fractions of an inch: 
c I 1 ' 1 ' 1 r i，r， I 1 r， r 1 ' 1 ' 1 In general, px is the highest power of 2 that divides x, when 
x ^ 0; and we define pO = oo. The recurrence relations 

p(2z + 1) = 0 , p(2x) = p(x) + 1 (44) 

also serve to define px for nonzero x. Another handy relation is worthy of note, 

p(x-y) = p(x©y). ( 4 5 ) 


like MMIX，or vice versa. The difference is noticeable only when data is loaded 
and stored in different-sized chunks，or passed between machines. 

Working with the rightmost bits. Big-endian and little-endian approaches 
aren’t readily interchangeable in general, because the laws of arithmetic send 
signals leftward from the bits that are “least significant •” Some of the most 
important bitwise manipulation techniques are based on this fact. 

If x is almost any nonzero 2-adic integer，we can write its bits in the form 

x = (a01 a 10 6 ) 2 ; ( 3 2) 
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The elegant formula x & —x in ( 37 ) allows us to extract the rightmost 1 bit 
very nicely，but we often want to identify exactly which bit it is. The ruler 
function can be computed in many ways，and the best method often depends 
heavily on the computer that is being used. For example，a two-instruction 
sequence due to J. Dalios does the job quickly and easily on MMIX: 

SUBU t,x,l; SADD rho,t ,x. ( 46 ) 

We shall discuss here two approaches that do not rely on exotic commands like 
SADD; and later，after learning a few more techniques, we’ll consider a third way. 

The first general-purpose method makes use of “magic mask” constants (ik 
that prove to be useful in many other applications，namely 

fio = (... 101010101010101010101010101010101)2 = -1/3, 

//! = (... 100110011001100110011001100110011)2 = -1/5, ( 47 ) 

f! 2 = [… lOOOOllllOOOOllllOOOOllllOOOOllllh = -1/17, 

and so on. In general fik is the infinite 2-adic fraction —1/(2 2 ^ + 1)，because 

(2 2#> + 1)/Hk = ( 叫 《 2 k ) + pk = (... 11111)2 = — 1. On a computer that has 2 d - 

bit registers we don’t need infinite precision, of course，so we use the truncated 

constants d k 

= ( 2 2 - l)/( 2 2 + 1 ) for 0 < /c < d. ( 48 ) 

These constants are familiar from our study of Boolean evaluation, because they 
are the truth tables of the projection functions x n _/ c (see, for example. 7.1.2—( 7 )). 
When x is a power of 2. we can use these masks to compute 

px = [x & /x 0 = 0] + 2[x & pi = 0] + 4[x & #2 = 0] + 8[x & #3 = 0] + ..., ( 49 ) 

because [2 J & = 0] = jk when j =( … J 3 J 2 J 1 Jo) 2 . Thus, on a 2^-bit computer, 

we can start with p 0 and y x Sz —x\ then set p p + 2 k if ySz = 0 , for 
0 < k < d. This procedure gives p = px when x 0. (It also gives p0 = 2 d — 1. 
an anomalous value that may need to be corrected; see exercise 30.) 

For example，the corresponding MMIX program might look like this: 

mO GREG #5555555555555555 ;ml GREG #3333333333333333; 
m2 GREG #0f Of Of Of Of Of Of Of ;m3 GREG #00f f OOf f OOf f OOf f ; 
m4 GREG #0000ffff OOOOffff ;m5 GREG #00000000ffffffff ; 

NEGU y ， x; AND y,x,y; AND q,y ,m5 ; ZSZrho ， q,32; 

AND q ， y ， m4; ADD t ,rho , 16; CSZ rho ， q ， t; (50) 

AND q,y,m3; ADD t ， rho, 8 ; CSZ rho ， q ， t; 

AND q ， y ， m2 ; ADD t ， rho ， 4; CSZ rho ， q ， t; 

AND q 3 y ， ml; ADD t ， rho ， 2 ; CSZ rho ， q ， t; 

AND q,y,mO; ADD t ， rho ， 1; CSZ rho ,q,t; 

total time = 19 认 Or we could replace the last three lines by 

SRU y ， y ， rho; LDB t ,rhotab 3 y; ADD rho ， rho，t (51) 

where rho tab points to the beginning of an appropriate 129-byte table. The 
total time would then be + 13^. 


Dallos 
SADD 
magic mask 

mask: A bit pattern with Is in key position ； 

2-adic fraction 

truth tables 

projection functions 

MMIX 

CSZ 

zsz 
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The second general-purpose approach to the computation of px is quite 
different. On a 64-bit machine it starts as before, with y x Sz —x\ but then it 
simply sets 

p decode [((a - y) mod 2 64 ) 58], ( 52 ) 

where a is a suitable multiplier and decode is a suitable 64-byte table. The 
constant a = («63 ... ^ 1 ^ 0)2 must have the property that its 64 substrings 


CLQSCLQ2 - - - ^ 58 , ^ 62^61 - - - ^ 57 , ••” ^ 5^4 - - - ^0 * • • • , 勿 00000 


are distinct. Exercise 2.3.4.2-23 shows that many such “de Bruijn cycles” exist; 


for example, we can use M. H. Martin’s constant # 03f79d71b4ca8b09 ， which 
is discussed in exercise 3.2.2-17. The decoding table decode [0], … ， decode[63] is 


00, 01 ， 56,02,57,49,28,03,61 ， 58,42,50,38,29,17,04, 
62,47,59,36,45,43,51,22,53,39,33,30,24,18,12,05, 
63,55,48,27,60,41,37,16,46,35,44,21,52,32,23,11, 
54, 26,40,15,34,20,31 ， 10,25,14,19,09,13,08,07, 06. 


(53) 


[This technique was devised in 1997 by M. Lauter. and independently by C. E. 
Leiserson ， H. Prokop，and K. H. Randall a few months later (unpublished). 
David Seal had used a similar method in 1994, with a larger decoding table.] 


Working with the leftmost bits. The function Xx = Llg 尤 」， which is dual to 
px because it locates the leftmost 1 when x > 0. was introduced in Eq. 4. 6 .3-(6). 
It satisfies the recurrence 


Al = 0; \{2x) = \{2x + 1) = A(x) + 1 for x > 0; ( 54 ) 

and it is undefined when x < 0. What is a good way to compute it? Once again 
MMIX provides a quick-but - tricky solution: 

FLOTU y,R0UND_D0WN,x; SUB y,y,fone; SR 1am ， y,52 ( 55 ) 

where fone = # 3ff0000000000000 is the floating-point representation of 1.0. 
(Total time 6v.) This code floats x，then extracts the exponent. 

But if floating-point conversion is not readily available, a binary reduction 
strategy works fairly well on a 2^-bit machine. We can start with A 0 and 
y ^ x\ then we set A A + 2 k and y t y 》 2 k \i y Sz Jlk # 0 ， for A; = d — 1 ， 
…，1， 0 (or until k is reduced to the point where a short table can be used to 
finish up). The MMIX code analogous to ( 50 ) and ( 51 ) is now 

ANDN q ， x ， m5; SRU z ， x ， 32; SET y ， x; CSNZ y ， q ， z; ZSNZ lam,q, 32; 

ANDN q ， y ， m4; SRU z,y, 16; ADD t ,1am, 16; CSNZ y ， q ， z; CSNZ lam ， q ， t; 
ANDN q ， y ， m3; SRU z ， y ， 8 ; ADD t ,1am,8; CSNZ y ， q ， z; CSNZ lam ， q ， t; 

LDB t ,lamtab,y; ADD lam,lam,t ; ( 56 ) 

and the total time is /i + 17v. In this case table lamtab has 256 entries，namely 
Xx for 0 < x < 256. Notice that the “conditional set” (CS) and “zero or set” 
(ZS) instructions have been used here instead of branch instructions. They tend 
to save time, even though they’ve made the program slightly longer. 


de Bruijn cycles 

Martin 

Lauter 

Leiserson 

Prokop 

Randall 

Seal 

leftmost bits+ 

Xx-\- 

[1 芦 x] + 

binary logarithm+ 

leftmost 

floating-point 

CSNZ 

ZSNZ 

MMIX 

conditional set 
zero or set 
branch instructions 
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There appears to be no simple way to extract the leftmost 1 bit that appears 
in a register，analogous to the trick by which we extracted the rightmost 1 in ( 37 ). 
For this purpose we could compute y i— Xx and then l 《 y, if x ^ 0; but a binary 
“smearing right” method is somewhat shorter and faster: 

Set y t x, then y y \ {y ^ 2 k ) ior O < k < d. 

The leftmost 1 bit of x is then y — (y 》 1) • ( 57 ) 

[These non-floating-point methods have been suggested by H. S. Warren ， Jr. 

Other operations at the left of a register, like removing the leftmost run of 
Is, are harder yet; see exercise 39. But there is a remarkably simple, machine- 
independent way to determine whether or not Xx = Xy^ given unsigned integers 
x and y. in spite of the fact that we can’t compute Xx or Xy quickly: 

Ax = Xy if and only if x ㊉ y $ x & y. ( 58 ) 

[See exercise 40. This elegant relation was discovered by W. C. Lynch in 2006.] 
We will use ( 58 ) below, to devise another way to compute Ax. 

Sideways addition. Binary n-bit numbers x = (x n _i .. . ^ 1 X 0)2 are often used 
to represent subsets X of the n-element universe {0. 1,. .. . n — 1 }， with k ^ X 
if and only if 2 k C x. The functions Xx and px then represent the largest and 
smallest elements of X • The function 

"x = H - h + x 0 , ( 59 ) 

which is called the “sideways sum” or “population count” of x，also has obvious 
importance in this connection，because it represents the cardinality |X|, namely 
the number of elements in X • This function，which we considered in 4.6.3—( 7 )， 
satisfies the recurrence 

"0 = 0 ; v(2x) = v(x) and "( 2 x + l) = v(x) + 1 , for x > 0 . ( 60 ) 

It also has an interesting connection with the ruler function (exercise 1.2.5-11). 

n 

px = 1 + v{x — l) — vx\ equivalently. pk = n — vn. ( 61 ) 

k=l 

The first textbook on programming, The Preparation of Programs for an 
Electronic Digital Computer by Wilkes, Wheeler, and Gill，second edition (Read¬ 
ing, Mass.: Addison—Wesley, 1957) ， 155, 191—193, presented an interesting sub¬ 
routine for sideways addition due to D. B. Gillies and J. C. P. Miller. Their 
method was devised for the 35-bit numbers of the ED SAC, but it is readily 
converted to the following 64-bit procedure for vx when x = (xqs .. . xiXo) 2 - 

Set y x — ((x 》 1) & /io)- (Now y = (W 31 • • • 1 / 11 / 0 ) 4 , where Uj = Xj^i + Xj.) 
Set y + ((y > 2) & "i). (Now y = ( 外 5 … ^ 1 ^ 0 ) 16 , Vj = Uj+i + Uj.) 

Set y (?/+ (y >4))&/i 2 . (Now y = (w 7 .. .^ 1 ^ 0 ) 256 , =〜+i + v j-) 

Finally v ((a • y) mod 2 64 ) 》 56, where a = (11111111)256. ( 62 ) 

The last step cleverly computes y mod 255 = 1^7 + - • -+ 1^1 +i^o via multiplication, 
using the fact that the sum fits comfortably in eight bits. [David Muller had 
programmed a similar method for the ILLIAC I machine in 1954.] 


smearing right 
Warren 
run of Is 
Lynch 

sum of bits, see sideways sum 

ones counting, see sideways 

sideways addition+ 

subsets 

largest 

smallest 

population count 

cardinality 

Wilkes 

Wheeler 

Gill 

Gillies 

Miller 


EDSAC 

remainder 


mod 2 n — 1 


Muller 
ILLIAC I 
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If x is expected to be “sparse,” having at most a few 1 bits, we can use a 

faster method [P. Wegner, CACM 3 (1960), 322]: 

Set v y x. Then while y ♦ 0, set u u + 1^ y i- y Sz (y — 1). ( 63 ) 

A similar approach, using y ^ y \ (y + 1). works when x is expected to be “dense.” 


Bit reversal. For our next trick, let’s change x = (xqs . .. xiXo )2 to its left- 
right mirror image, x R = (xqXi .. . ^ 63 ) 2 - Anybody who has been following the 
developments so far, seeing methods like ( 50 ). ( 56 ), ( 57 )，and ( 62 ), will probably 
think. “Aha — once again we can divide by 2 and conquer! If we’ve already 
discovered how to reverse 32-bit numbers，we can reverse 64-bit numbers almost 
as fast, because (xy) R = y R x R • All we have to do is apply the 32-bit method in 
parallel to both halves of the register，then swap the left half with the right half.” 

Right. For example, we can reverse an 8 -bit string in three easy steps: 


Given 
Swap bits 
Swap nyps 
Swap nybbles 


X7X6X5X4X3X2X1X0 

XQX7X4X5X2XSX0X1 

X4X5XQX7X0X1X2XS 

XqXiX2XsX4X^XqX7 


( 64 ) 


And six such easy steps will reverse 64 bits. Fortunately, each of the swapping 
operations turns out to be quite simple with the help of the magic masks /x^: 


y l (x 》 1 ) & /i 0 ， 

y (x>4)&/x 2 , 

y i (r 》 8) & " 3 , 

y (x 》 16) & " 4 , 


z l (x & /x 0 ) 《1， 
z ^ (x k /j ， i) <C 2 , 
z ^ (x k fi 2 ) <C 4, 
z ^ (x k /j ， s) <C 8 , 

Z ^ (x & /i4) 《16, 


x ^ y\ z\ 
x ^ y \ z\ 
x ^ y\ z\ 

X ^y\z\ 

x ^ y z] 


x ^ {x^> 32) I ((x 32) mod 2 64 ). 



[Christopher Strachey foresaw some aspects of this construction in CACM 4 
(1961) ， 146, and a similar ternary method was devised in 1973 by Bruce Baum- 
gart (see exercise 49). The mature algorithm ( 65 ) was presented by Henry S. 
Warren, Jr., in Hacker’s Delight (Addison-Wesley, 2002), 102.] 

But MMIX is once again able to trump this general-purpose technique with 
less traditional commands that do the job much faster. Consider 


rev GREG #0102040810204080; M0R x,x,rev; M0R x,rev,x; (66) 


the first M0R instruction reverses the bytes of x from big-endian to little-endian 
or vice versa，while the second reverses the bits within each byte. 


Bit swapping. Suppose we only want to interchange two bits within a register, 
Xi Xj, where i > j. What would be a good way to proceed? (Dear reader, 
please pause for a moment and solve this problem in your head，or with pencil 
and paper — without looking at the answer below.) 

Let S = i — j• Here is one solution (but don’t peek until you’re ready): 

y ^r- (x^>S) z ^r- (x& 2 J ) x ^ (xSzm) \ y \ where m = 2 Z | 2 J . ( 67 ) 


Wegner 

reversal of bits+ 

divide by 2 and conquer 

magic masks 

Strachey 

Baumgart 

Warren 

big-endian 

little-endian 

MOR 

swapping bits+ + + 
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It uses two shifts and five bitwise Boolean operations，assuming that i and j 
are given constants. It is like each of the first lines of ( 65 )，except that a new 
mask m is needed because y and z don’t account for all of the bits of x. 

We can, however. do better, saving one operation and one constant: 

y i (x ㊉ (x 》 6)) & 2 J ， x t x ® y ㊉ (y 《 6). (68) 

The first assignment now puts ㊉ into position j; the second changes Xi to 
㊉㊉ Xj) and to Xj ㊉㊉ ： r^)，as desired. In general it’s often wise to 
convert a problem of the form ^change x to f(x)^ into a problem of the form 
“change x to x ㊉ g(x)^ since the bit-difference g{x) might be easy to calculate. 

On the other hand ， there’s a sense in which ( 67 ) might be preferable to ( 68 ), 
because the assignments to y and z in ( 67 ) can sometimes be performed simulta¬ 
neously. When expressed as a circuit ， ( 67 ) has a depth of 4 while ( 68 ) has depth 5. 

Operation ( 68 ) can of course be used to swap several pairs of bits simulta- 
neously，when we use a general mask 6 instead of 2 J \ 

y t (r ㊉ (a: 》 6)) & 6>, x 1 x ㊉ y ㊉ (" 《 6). (69) 

Let us call this operation a “J-swap,” because it allows us to swap any non¬ 
overlapping pairs of bits that are S places apart. The mask 6 has a 1 in the right¬ 
most position of each pair that’s supposed to be swapped. For example, ( 69 ) will 
swap the leftmost 25 bits of a 64-bit word with the rightmost 25 bits, while leav¬ 
ing the 14 middle bits untouched, if we let S = 39 and 0 = 2 25 — 1 = # lf ff f f f. 
Indeed, there’s an astonishing way to reverse 64 bits using 5 - swaps, namely 

^ (x > 1 ) & fi 0 , z l (x & /i 0 ) 《 1 ， x ^ y \ z, 
y (x © (x 》 4)) & # 0300c0303030c303 ， x <- :r ㊉ y ㊉ （ y 《 4), 
y l (x ㊉ (x 》 8 )) & # 00c0300c03f 0003f ， r ㊉ y ㊉ (y 《 8 )， ( 70 ) 

y (: r ㊉ (x 》 20)) & # 00000f f c00003f f f , x<—x ㊉ y ㊉ （ y 《 20), 

x <— (x^> 34) I ((x <C 30) mod 2 64 ), 

saving two of the bitwise operations in ( 65 ) even though ( 65 ) looks “optimum.” 


*Bit permutation in general. The methods we’ve just seen can be extended to 
obtain an arbitrary permutation of the bits in a register. In fact, there always ex- 

A 八 

ist masks 沒 0 ， • • • ，沒 5 , 沒 4 ， …• ，沒 o such that the following operations transform x = 
(xqs - - - ^ 1 ^ 0)2 into any desired rearrangement x n = ($ 63 tt • • • x 1 tt x 0 tt )2 °f its bits: 


x ^r- 2 fc -swap of x with mask 〜， for A: = 0 ， 1 ， 2 ， 3 ， 4, 5; 
x ^r- 2 fc -swap of x with mask for A; = 4 ， 3, 2, 1 ， 0. 



In general，a permutation of 2 d bits can be achieved with 2d — 1 such steps, 
using appropriate masks 9k, 9k, where the swap distances are respectively 2°, 

ol nd—1 

, • • • , z 

To prove this fact, we can use a special case of the permutation networks 
discovered independently by A. M. Duguid and J. Le Corre in 1959. based on 
earlier work of D. Slepian [see V. E. Benes, Mathematical Theory of Connecting 
Networks and Telephone Traffic (New York: Academic Press, 1965). Section 3.3]. 


2 \ 2 °. 
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S-swcip 
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Figure 12 shows a permutation network P(2n) for 2n elements constructed from 
two permutation networks for n elements, when n = 4. Each c 丁’ connection 
between two lines represents a crossbar module that either leaves the line contents 
unaltered or interchanges them，as the data flows from left to right. Every setting 
of the individual crossbars therefore causes P(2n) to produce a permutation of 
its inputs; conversely, we wish to show that any permutation of the 2n inputs 
can be achieved by some setting of the crossbars. 

The construction of Fig. 12 is best understood by considering an example. 
Suppose we want to route the inputs (0, 1，2,3,4, 5, 6 , 7) to (3,2,4,1, 6 , 0, 5, 7)， 
respectively. The first job is to determine the contents of the lines just after the 
first column of crossbars and just before the last column，since we can then use 
a similar method to set the crossbars in the inner P(4)^s. Thus, in the network 


rearrangeable networks, see perm networks 

crossbar module 

graph 

bipartite graph 



we want to find permutations abcdef gh and ABCDEFGH such that {a,b} = {0,1}， 

{c，d} = {2,3}， …， {g,h} = {6,7}, {a,c,e,g} = {A,C,E,G}, {b,d,f,h}= 
{B ， D ， F ， H} ， {A ， B} = {3, 2 }， {C ， D} = {4,1}， …， {G ， H} = {5,7}. Starting at 
the bottom，let us choose h = 7. because we don’t wish to disturb the contents 
of that line unless necessary. Then the following choices are forced •• 

H = 7; G = 5; e = 5; f = 4; D = 4; C=l; a = 1; b = 0; F = 0; E = 6 ; g = 6 . ( 73 ) 

If we had chosen h = 6 . the forcing pattern would have been similar but reversed, 

F = 6 ; E = 0; a= 0; b = 1; D = 1; C = 4; e = 4; f = 5; H = 5; G = 7; g = 7. ( 74 ) 

Options ( 73 ) and ( 74 ) can both be completed by choosing either d = 3 (hence 
B = 3, A = 2 ， c = 2) or d = 2 (hence B = 2 ， A = 3, c = 3). 

In general the forcing pattern will go in cycles, no matter what permutation 
we begin with. To see this, consider the graph on eight vertices {ab. cd, ef ， gh， 
AB，CD，EF，GH} that has an edge from uv to UV whenever the pair of inputs 
connected to uv has an element in common with the pair of outputs connected 
to UV. Thus, in our example the edges are ab —— EF，ab —— CD，cd —— AB. 
cd —— AB, ef —— CD, ef —— GH, gh —— EF, gh —— GH. We have a “double bond” 
between cd and AB，since the inputs connected to c and d are exactly the outputs 
connected to A and B; subject to this slight bending of the strict definition of 
a graph, we see that each vertex is adjacent to exactly two other vertices, and 
lowercase vertices are always adjacent to uppercase ones. Therefore the graph 
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cycles in a graph 
transpose 

matrix transposition 


Fig. 12. The inside of a black box P(2n) that permutes 2n elements 
in all possible ways, when n > 1. (Illustrated for n = 4.) 

always consists of disjoint cycles of even length. In our example, the cycles are 

/ EF — gh \ 

ab\ /GH cd 二 AB， （75) 

CD —ef 

where the longer cycle corresponds to ( 73 ) and ( 74 ). If there are k different 
cycles，there will be 2 k different ways to specify the behavior of the first and last 
columns of crossbars. 

To complete the network，we can process the inner 4 - element permutations 
in the same way; and any 2 ^-element permutation is achievable in this same 

八 

recursive fashion. The resulting crossbar settings determine the masks 9j and 6j 
of ( 71 ). Some choices of crossbars may lead to a mask that is entirely zero; then 
we can eliminate the corresponding stage of the computation. 

If the input and output are identical on the bottom lines of the network, our 
construction shows how to ensure that none of the crossbars touching those lines 
are active. For example，the 64-bit algorithm in ( 71 ) could be used also with a 
60-bit register, without needing the four extra bits for any intermediate results. 

Of course we can often beat the general procedure of ( 71 ) in special cases. 

For example, exercise 52 shows that method ( 71 ) needs nine swapping steps to 
transpose an 8 x 8 matrix, but in fact three swaps suffice: 

Given 7-swap 14-swap 28-swap 

00 01 02 03 04 05 06 07 00 10 02 12 04 14 06 16 00 10 20 30 04 14 24 34 00 10 20 30 40 50 60 70 



crossbar modules 

/ \ 



2n outpm^s 
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The “perfect shuffle” is another bit permutation that arises frequently in 
practice. If x = (. .. X 2 XiXo )2 and y = (. .. y 2 Viyo )2 are any 2-adic integers, we 
define x\y (“尤 zip y •:’ the zipper function of x and y) by interleaving their bits: 

x\y = -x 2 y 2 XiyiXoyo) 2 . ( 76 ) 

This operation has important applications to the representation of 2-dimensional 
data，because a small change in either x or y usually causes only a small change 
in x ^y. Notice also that the magic mask constants ( 47 ) satisfy 

t l^k = - (77) 

If x appears in the left half of a register and y appears in the right half，a perfect 
shuffle is the permutation that changes the register contents to x \ 

A sequence oi d — 1 swapping steps will perfectly shuffle a 2^-bit register; in 
fact, exercise 53 shows that there are several ways to achieve this. Once again, 
therefore，we are able to improve on the {2d— l)-step method of ( 71 ) and Fig. 12. 

Conversely, suppose we,re given the shuffled value z = x ^ y in a, 2^-bit 
register; is there an efficient way to extract the original value of yl Sure: If the 
d — 1 swaps that do a perfect shuffle are performed in reverse order ， they’ll undo 
the shuffle and recover both x and y. But if only y is wanted，we can save half of 
the work: Start with y ^ z Sz /io ； then set y (y ㊉ （ y 》 2 /c ~ 1 )) & /i^ for k = 1. 
... : d — 1. For example, when d = 3 this procedure goes (0ys0y20yi0yo)2 ^ 
(00ysy200yiyo)2 ^ ( 0000 ^ 3 ^ 2 ^/ 1 ^ 0 ) 2 - “Divide and conquer” conquers again. 

Consider now a more general problem，where we want to extract and com¬ 
press an arbitrary subset of a register’s bits. Suppose we’re given a 2^-bit word 
z = (z 2 d_i . .. z\Zq )2 and a mask % = (x 2 d -i - - - XiXo )2 that has s 1 -bits; thus 
v\ — s. The problem is to assemble the compact subword 

y = (y s -umh = ( 2^4 ... 2 ^ 勺。 ） 2 ， （ 78 ) 

where j s -i > … > ji > jo are the indices where \j = 1- For example, if 
d = 3 and % = (10110010)2, we want to transform z = (^ 3 ^ 3 ^ 2?/1 ^ 2^1 ^ 0 ^ 0)2 into 
y = ( 2 / 32 / 22 / 12 / 0 ) 2 - (The problem of going from x\y to considered above，is the 
special case x = Mo-) We know from ( 71 ) that y can be found by 5 - swapping, 
at most 2d — 1 times; but in this problem the relevant data always moves to the 
right，so we can speed things up by doing shifts instead of swaps. 

Let’s say that a (5-shift of x with mask 6 is the operation 

x x ㊉ （(x ㊉ （ : r 》 6 )) & 沒)， （ 79 ) 

which changes bit Xj to Xj+j if 沒 has 1 in position j, otherwise it leaves Xj 
unchanged. Guy Steele discovered that there always exist masks 沒 0 , 沒 1 ， • • • ^d-i 
so that the general extraction problem ( 78 ) can be solved with a few ^-shifts: 

Start with x ^ z\ then do a 2^-shift of x with mask Qk, 

for A; = 0 ， 1 , •. •, (i — 1 ; finally set y x. ( 80 ) 

In fact, the idea for finding appropriate masks is surprisingly simple. Suppose 
exactly ti of the bits need to travel l places to the right. When we do a 2^-shift, 
we’ll use 0k to select those bits for which U/ 2 fc 」is odd. 


perfect shuffle 
2-adic integers 

interleaving, see zipper function, perf shuffle 

2-dimensional data 

magic mask 

Divide and conquer 

extract and compress 

mask 

packing 

5-shift 

Steele 
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For example, when d = 3 and x = (10110010)2 we have (to^ 1 ^ 2 ^ 3 ^ 4 )= 
(0, 1 ， 0,2, 5)，including Os to be shifted in from the left so that ti = 2 d . If we 

set 6> 0 = ( 00111011 ) 2 , 6>i = ( 00011110 ) 2 , 0 2 = (11111000) 2 , then ( 80 ) maps 


sheep-and-goats 
notation Z •[* X 
mappings 
Chung 


Wong 

(y3^3y2yi^iyo^o)2 ^ (y3^3^3y2yi^i^iyo)2 ^ (y3x 3 x 3 x 3 x 3 y 2 yiyo)2 ^ (0 000 ^ 3 ^ 2 ^ 1 ^/ 0)2 - c y clic 

masks 


Exercise 69 proves that the bits being extracted will never interfere with each 
other during their journey. Furthermore, there’s a slick way to compute the 
necessary masks 0k dynamically in 0(d 2 ) steps (see exercise 70). 

A “sheep-and-goats” operation has been suggested for computer hardware, 
extending ( 78 ) to produce the general unshuffled word 


pi, as ’’random” example 
recursively 


Z + X= (X r -1 • • • XiXoy 5 -l . . . 2/12/0)2 = (^ r _i Z js _ 1 ...^^ 0 ) 2 ； (8l) 

here i r ^i > … > i\ > io are the indices where ^ - Any permutation of 2 d 

bits is achievable via at most d sheep-and-goats operations (see exercise 73). 

Shifting also allows us to go beyond permutations, to arbitrary mappings of 
bits within a register. Suppose we want to transform 


^ = ( 尤 2 ^ —1 • • • 尤 0)2 1 ~^ = ( 尤 ( 2 d_l)(p • • • ^lcp^0cp^2 •) ( 82 ) 

where cp is any of the ( 2 d ) 2d functions from the set { 0 , 1 ，…， — 1 } into itself. 

K. M. Chung and C. K. Wong [IEEE Transactions C-29 (1980), 1029-1032] 

discovered an attractive way to do this in 0{d) steps by using cyclic 5 - shifts, 
which are like ( 79 ) except that we set 

X a: ㊉ （(x ㊉ (T 》 5) ㊉ (x 《 (2 d - 5))) & 沒 ). ( 83 ) 

Their idea is to let c/ be the number of indices j such that jcp = for 0 < / < 2 d . 
Then they find masks 沒 0 ，沒 1 ， •… ， 0d_i with the property that a cyclic -shift 
of x with mask 6 k ^ done successively for 0 < k < will transform x into a 
number x f that contains exactly c/ copies of bit x\ for each l. Finally the general 
permutation procedure ( 71 ) can be used to change x f . 

For example, suppose d = 3 and = (xsXiXiXoXsX 7 X^x^) 2 - Then we have 

(co 5 ci,c 2 ,C 3 ,C 4 5 C 5 5 c 6 ,C 7 ) = (1 ， 2,0,2,0, 2,0, 1)• Using masks 6 0 = ( 00011100 ) 2 , 

0 \ = ( 01001001 ) 2 , and 62 = ( 00100000 ) 2 , three cyclic 2 -shifts now take x = 

(^7^ 6 ^5^4^3^2^1^ 0 )2 ^ (^7^ 6 ^5^5^4^3^1^ 0 )2 ^ (x7X 0 X 5 X 5 X 5 XsXiXs)2 ^ 

(x^XqXiX 5 x 5 XsXiXs )2 = Then, five (^-swaps: x f ^ {xqXjx^XiX 2 > x^ ) x^Xi )2 ^ 
(x 0 x 7 xr ) x 1 x ?> xix ?> xr ) )2 ^ ^ ( K X ?> X 1 XoX 1 X ?> X 7 Xr ) Xr ) )2 ^ 

(xsXiXiXoXsX 7 X 5 X ^)2 = ; we’re done! Of course any 8 -bit mapping can be 

achieved more quickly by brute force，one bit at a time; the method of Chung 
and Wong becomes much more impressive in a 256-bit register. Even with MMIX’s 
64-bit registers it，s pretty good，needing at most 96 cycles in the worst case. 

To find 沒 0 ， we use the fact that ^ c/ = 2 d ^ and we look at E ev en = ^2 c 2 / 
and E 0 dd = X] c 2 /+i- ^ Seven = ^odd = ^ we can set ^o = 0 and omit the 

cyclic 1-shift. But if ， say, Seven < Eodd，we find an even l with c\ = 0. Cyclically 
shifting the bits Z.Z + 1 ，Z + t (modulo 2 d ) for some t will produce new counts 

(4 ... ， 4 ^) for which 珥 ■ = A dd = V- 1 ; so 0 o = 2 l + --- + 2 ㈣) mod 2 ' 

Then we can deal with the bits in even and odd positions separately, using the 
same method, until getting down to 1-bit subwords. Exercise 74 has the details. 
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fragmented fields 
subsets 
mask 

lexicographic order 
subcube 
don’t-cares 
asterisk codes 
bit codes 

scattered accumulator 
sheep-and-goats 
carries 

scattered sum 

In the special case when x = 0 and % ^ 0. we’ve already seen in ( 37 ) that this for¬ 
mula produces the rightmost bit of %， which corresponds to the lexicographically 
smallest nonempty subset of U. 

Why does formula ( 84 ) work? Imagine adding 1 to the number x \ which 
has Is wherever x is 0. A carry will propagate through those Is until it reaches 
the rightmost bit position where x has a 0 and x has a 1 ; furthermore all bits 
to the right of that position will become zero. Therefore x f = ((x | %) + 1) & X- 
But we have (x | x) + 1 = (^ + x) + l = x +(x + l) — x ~ X when x C x- QED. 

Notice further that = 0 if and only if x = So we’ll know when we’ve 

found the largest subset. Exercise 79 shows how to go back to x，given x f . 

We might also want to run through all elements of a subcube — for example ， 
to find all bit patterns that match a specification like * 10 * 1 * 01 , consisting of 
0s ， ls，and *s (don’t-cares). Such a specification can be represented by asterisk 
codes a = (a n _i . .. ao )2 and bit codes b = (b n —i... 60 ) 2 ，as in exercise 7.1.1-30; 
our example corresponds to a = ( 10010100 ) 2，6 = ( 01001001 ) 2 . The problem of 
enumerating all subsets of a set is the special case where a = \ and 6 = 0. In 
the more general subcube problem，the successor of a given bit pattern x is 

x r = ((x — (a + b)) & a) + 6 . ( 85 ) 

Suppose the bits of z = (z n _i .. . ^ 0)2 have been stitched together from two 
subwords x = (x r _i . .. ^ 0)2 and y = (y s -i - - . 1 / 0 ) 2 ，where r + s = n, using 
an arbitrary mask x for which = s to govern the stitching. For example ， 
z = {y 2 x A x ?>yi x 2 yo x i x o )2 when n = 8 and % = ( 10010100 ) 2 - We can think of 2 ： 
as a “scattered accumulator,” in which alien bits Xi lurk among friendly bits yj. 

From this viewpoint the problem of finding successive elements of a subcube is 
essentially the problem of computing y + 1 inside a scattered accumulator z, 
without changing the value of x. The sheep-and-goats operation ( 81 ) would 
untangle x and y; but it，s expensive, and ( 85 ) shows that we can solve the 
problem without it. We can, in fact, compute y y f when y r = (y f s _i ... y f 0 )2 
is any value inside a scattered accumulator z f ， H y and y f both appear in the 
positions specified by %: Consider t = z Sz x and t f = z f Sz X- If we form the 
sum (t I x) + all carries that occur in a normal addition y + y’ will propagate 
through the blocks of Is in 叉 ， just as if the scattered bits were adjacent. Thus 

(0 & X) + 0’ & X) + 叉 ) & X (86) 

is the sum of y and y f ， modulo 2 s ， scattered according to the mask % • 


Working with fragmented fields. Instead of extracting bits from various 
parts of a word and gathering them together，we can often manipulate those bits 
directly in their original positions. 

For example, suppose we want to run through all subsets of a given set U, 
where (as usual) the set is specified by a mask % such that [k ^U] = (x 》 A:) & 1 . 
If x C x and x ^ there’s an easy way to calculate the next largest subset of U 
in lexicographic order, namely the smallest integer x f > x such that x r C 

^ = 0 —(84) 
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Tweaking several bytes at once. Instead of concentrating on the data in one 
field within a word, we often want to deal simultaneously with two or more sub¬ 
words. performing calculations on each of them in parallel. For example，many 
applications need to process long sequences of bytes, and we can gain speed by- 
acting on eight bytes at a time; we might as well use all 64 bits that our machine 
provides. General multibyte techniques were introduced by Leslie Lamport in 
CACM 18 (1975) ， 471—475，and subsequently extended by many programmers. 

Suppose first that we simply wish to take two sequences of bytes and find 
their sum, regarding them as coordinates of vectors，doing arithmetic mod¬ 
ulo 256 in each byte. Algebraically speaking, we’re given 8 -byte vectors x = 
(x 7 … x 1 x 0 ) 2 56 and y = (y 7 .. . "iy 0 ) 256 ; we want to compute z = (z 7 … z 1 z 0 ) 2 56, 
where Zj = {xj + Vj) mod 256 for 0 < j < 8 . Ordinary addition of x to y doesn’t 
quite work, because we need to prevent carries from propagating between bytes. 
So we separate out the high - order bits and deal with them separately: 

z (x ㊉ y) Sz h' where h = #8080808080808080; 

z ((x & &) + (y & 及 )）㊉ z. ( 87 ) 

The total time for MMIX to do this is 6 r，plus 3/i + 3r if we also count the time to 
load x, load y，and store z. By contrast, eight one-byte additions (LDBU. LDBU. 
ADDU，and STBU，repeated eight times) would cost 8 x (3/i + 4^) = 24/x + 32v. 
Parallel subtraction of bytes is just as easy (see exercise 88 ). 

We can also compute byte wise averages, with Zj = [_{xj + for each j : 

z t (0 ㊉ y) & 『 ) 》1， where l = #0101010101010101 ; 

2 ： (x h y) -z. (88) 

This elegant trick，suggested by H. G. Dietz，is based on the well-known formula 

x + y = (x©y) + ((z&y) 《 1 ) ( 89 ) 

for radix-2 addition. (We can implement ( 88 ) with four MMIX instructions, not 
five，because a single M0R operation will change x ㊉ y to ((x ㊉ y) & F) 》 1.) 

Exercises 88-93 and 100—104 develop these ideas further, showing how to do 
mixed - radix arithmetic，as well as such things as the addition and subtraction of 
vectors whose components are treated modulo m when m needn’t be a power of 2 . 

In essence，we can regard the bits, bytes，or other subfields of a register as if 
they were elements of an array of independent microprocessors，acting indepen¬ 
dently on their own subproblems yet tightly synchronized, and communicating 
with each other via shift instructions and carry bits. Computer designers have 
been interested for many years in the development of parallel processors with a 
so-called SIMD architecture, namely a “Single Instruction stream with Multiple 
Data streams ”； see, for example, S. H. Unger ， Proc. IRE 46 (1958) ， 1744-1750. 
The increased availability of 64-bit registers has meant that programmers of 
ordinary sequential computers are now able to get a taste of SIMD processing. 
Indeed，computations such as ( 87 ) ，（ 88 ), and ( 89 ) are called SWAR methods — 
“SIMD Within A Register,” a name coined by R. J. Fisher and H. G. Dietz [see 
Lecture Notes in Computer Science 1656 (1999) ， 290-305]. 
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Of course bytes often contain alphabetic data as well as numbers，and one 
of the most common programming tasks is to search through a long string of 
characters in order to find the first appearance of some particular byte value. For 
example, strings are often represented as a sequence of nonzero bytes terminated 
by 0. In order to locate the end of a string quickly，we need a fast way to 
determine whether all eight bytes of a given word x are nonzero (because they 
usually are). Several fairly good solutions to this problem were found by Lamport 
and others; but Alan Mycroft discovered in 1987 that three instructions actually 
suffice: 

t 4 — h Sz {x — l) x. ( 90 ) 

where h and l appear in ( 87 ) and ( 88 ). If each byte Xj is nonzero, t will be zero; 
for (xj — l)SzXj will be 2 px ^ —1, which is always less than # 80 = 2 7 . But if Xj = 0, 
while its right neighbors , xq (if any) are all nonzero, the subtraction 

x — l will produce # ff in byte j, and t will be nonzero. In fact, pt will be 8 j + 7. 

Caution: Although the computation in (go) pinpoints the rightmost zero 
byte of x, we cannot deduce the position of the leftmost zero byte from the value 
of t alone. (See exercise 94.) In this respect the little-endian convention proves 
to be preferable to the corresponding big-endian behavior. An application that 
needs to locate the leftmost zero byte can use ( 90 ) to skip quickly over nonzeros, 
but then it must fall back on a slower method when the search has been narrowed 
down to eight finalists. The following 4-operation formula produces a completely 
precise test value t = ( 卜 … ¥ 0 ) 256 , in which tj = 128[xj = 0] for each j: 

t -(r- h Sz I {{x I h) — l)). ( 9 i) 

The leftmost nonzero byte of x is now where Xt = 8j + 7. 

Incidentally, the single MMIX instruction C BDIF t ， l ， x’ solves the zero-byte 
problem immediately by setting each byte tj of t to [xj = 0 ], because 1 — x = 
[x = 0]. But we are primarily interested here in fairly universal techniques that 
don’t rely on exotic hardware; MMIX^s special features will be discussed later. 

Now that we know a fast way to find the first 0 . we can use the same ideas 
to search for any desired byte value. For example, to test if any byte of x is the 
newline character ( # a), we simply look for a zero byte in x0 # OaOaOaOaOaOaOaOa. 

And these techniques also open up many other doors. Suppose, for instance, 
that we want to compute z = ( 2 : 7 … )256 from x and where Zj = Xj 
when Xj = yj but Zj = when Xj ♦ yj. (Thus if x = "beaching" and 
y = "belching", we’re supposed to set z l "be*ching".) It’s easy: 

t 卜 "& (0 © y) 1 (((x ㊉ ？ ；） 1") - z)); 

m (t <C 1) - (t > 7); ( 92 ) 

2 ： x 0 ((x 0 "********")& m). 

The first step uses ( 91 ) to flag the high-order bits in each byte where Xj ^ yj. 
The next step creates a mask that highlights those bytes; the mask is # 00 if 
Xj = yj and # ff otherwise. And the last step，which could also be written z 4— 
{x & m) I ( n ******** n & m). sets Zj Xj or Zj ; % depending on the mask. 
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Operations ( 90 ) and ( 91 ) were originally designed as tests for bytes that are 
zero; but a closer look reveals that we can more wisely regard them as tests for 
bytes that are less than 1. Indeed, if we replace l hy c • l = (cccccccc )256 in 
either formula, where c is any positive constant < 128. we can use (go) or (gi) 
to see if x contains any bytes that are less than c. Furthermore the comparison 
values c need not be the same in every byte position; and with a bit more work 
we can also do bytewise comparison in the cases where c > 128. Here’s an 8 - step 
formula that sets tj ^r- 128[xj < yj] for each byte position j in the test word t: 

t t h Sz ^(xyz). where z = {x \h) — {y h). ( 93 ) 

(See exercise 96.) The median operation in this general formula can often be 
simplified; for example, ( 93 ) reduces to (gi) when y = l, because (xlz) = x \ z. 

Once we’ve found a nonzero t in ( 90 ) or ( 91 ) or ( 93 )，we might want to 
compute pt or Xt in order to discover the index j of the rightmost or leftmost 
byte that has been flagged. The problem of calculating p or A is now simpler 
than before, since t can take on only 256 different values. Indeed, the operation 


j table [((a - 1) mod 2 64 ) 》 56], 


where a = 




now suffices to compute 乂 given an appropriate 256-byte table. And the mul¬ 
tiplication here can often be performed faster by doing three shift-and-add 
operations, “t l t + (t 《 7) ， t l t + (t 《 14), t t + (t <C 28)，” instead. 

Broadword computing. We’ve now seen more than a dozen ways in which 
a computer’s bitwise operations can produce astonishing results at high speed, 
and the exercises below contain many more such surprises. 

Elwyn Berlekamp has remarked that computer chips containing N flip-flops 
continue to be built with ever larger values of N. yet in practice only (9 (log TV) of 
those components are flipping or flopping at any given moment. The surprising 
effectiveness of bitwise operations suggests that computers of the future might 
make use of this untapped potential by having enhanced memory units that are 
able to do efficient n - bit computations for fairly large values of n. To prepare for 
that day, we ought to have a good name for the concept of manipulating “wide 
words.” Lyle Ramshaw has suggested the pleasant term broadword. so that we 
can speak of n-bit quantities as broadwords of width n. 

Many of the methods we’ve discussed are 2-adic. in the sense that they work 
correctly with binary numbers that have arbitrary (even infinite) precision. For 
example，the operation x & —x always extracts 2 px ， the least significant 1 bit of 
any nonzero 2-adic integer x. But other methods have an inherently broadword 
nature，such as the methods that use O(d) steps to perform sideways addition 
or bit permutation of 2^-bit words. Broadword computing is the art of dealing 
with n-bit words, when n is a parameter that is not extremely small. 

Some broadword algorithms are of theoretical interest only, because they are 
efficient only in an asymptotic sense when n exceeds the size of the universe. But 
others are eminently practical even when n = 64. And in general, a broadword 
mindset often suggests good techniques. 
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One fascinating-but-impractical fact about broadword operations is the dis¬ 
covery by M. L. Fredman and D. E. Willard that 0(1) broadword steps suffice 
to evaluate the function Xx = Llg 尤 」 for any nonzero n-bit number x，no matter 
how big n is. Here is their remarkable scheme，when n = g 2 and g is a power of 2: 

ti i- h &: (x I ((x I h) — l)). where h = 2 9 ~ 1 l and l = ( 2 n — 1 )/( 2 ^ — 1 ); 
y (((a - ti) mod 2 n ) 》 （n _ g)) . I ， where a = (2 n ~ 9 — 1 )/( 2 分 - 1 _ 1 ); 
t 2 h Sz (y \ ((y I h) — 6 )), where b = (2 n+9 — l)/(2 9+1 — 1 ); 

- (t 2 》 （g - 1 )), m i m ㊉ （ m 》 p); ( 95 ) 

2 ： l (((7 . (x & m)) mod 2 n ) (n — g)) - 1; 
t 3 ^ h k (z \ ((z \ h) - b)); 

A ^ ((/ • ((^2 > ( 2 ^ - lg^ - 1 )) + (t 3 > (2g - 1 )))) mod 2 n ) > (n- g). 

(See exercise 106.) The method fails to be practical because five of these 29 steps 
are multiplications, so they aren’t really “bitwise” operations. In fact, we*11 prove 
later that multiplication by a constant requires at least il(logn) bitwise steps. 

A multiplication-free way to find Ax, with only O(loglog n) bitwise broad¬ 
word operations, was discovered in 1997 by Gerth Brodal, whose method is even 
more remarkable than ( 95 ). It is based on a formula analogous to ( 49 ). 

Xx = [Xx = X(x Sz jlo)] + 2[Xx = X(x Sz fli)} + 4[Ax = X(x Sz ^ 2 )} + • * *, ( 96 ) 

and the fact that the relation Xx = Xy is easily tested (see ( 58 )): 

Algorithm B (Binary logarithm). This algorithm uses n-bit operations to 
compute Xx = pg 尤 」， assuming that 0 < x < 2 n and n = d • 2 d • 

Bl. [Scale down.] Set A t 0. Then set 入卜入 + and x t a : 》 if x > 2 2 ^, 
for |"lgn] > k > d. 

B2. [Replicate.] (At this point 0 < x < 2 2d ; the remaining task is to increase 
A by [lgx」. We will replace x by d copies of itself, in 2^-bit fields.) Set 
x x \ (x 《 2 d+k ) for 0 < A: < [lg d]. 

B3. [Change leading bits.] Set y 4— x & 〜 (/id ， d-i • • • • (See ( 48 ).) 

B4. [Compare all fields.] Set t -f- h (y \ ((y 丨 h) — (x ㊉ y )))，where h = 

/o2 d -l o2 d -lo2 d -l\ 

l ^ Al / o 2^ • 

B5. [Compress bits.] Set t (t + (t <C {2 dJrk — 2 k ))) mod 2 n for 0 < k < [lg d ]. 
B 6 . [Finish.] Finally, set A t A + (t 》 （n — d)). | 

This algorithm is actually competitive with ( 56 ) when n = 64 (see exercise 107). 

Another surprisingly efficient broadword algorithm was discovered in 2006 
by M. S. Paterson and the author, who considered the problem of identifying 
all occurrences of the pattern 01 r in a given n-bit binary string. This problem ， 
which is related to storage allocation, is equivalent to computing 

q = 5 & (x 《 1 ) & (x 《 2 ) & (x 《 3) & ••• & (x 《 r) ( 97 ) 


Fredman 

Willard 

Brodal 
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Knuth, DE 

pattern 

storage allocation 
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when x = (x n _i . .. ^ 1 X 0)2 is given. For example, when n = 16， r = 3, and 

x = ( 1110111101100111 ) 2 , we have q = (0001000000001000) 2 . One might expect 

intuitively that il(logr) bitwise operations would be needed. But in fact the 
following 21-step computation does the job for all n > r > 0: Let s = |"r/2], 
l = J2k>o 2ks mod 2 n ,h= (2 s ~ 1 l) mod 2 n , and a = (Z^ > 0 (— 1 产 + 1 2 2fcs ) mod 2 n . 

y t h Sz x Sz ((x & /i) + l); 
t (x y) Sz x Sz — 2 r ; 

u i— t Sz a, v t Sz (a 《 2 s); ( 98 ) 

m l (w _ (w 》 r)) I (r — (r 》 r)); 
g t & ((x & m) + ({t 》 r) & 〜 (m 《 1 ))). 

Exercise 111 explains why these machinations are valid. The method has little 
or no practical value; there’s an easy way to evaluate ( 97 ) in 2 「 lgr] + 2 steps, 
so ( 98 ) is not advantageous until r > 512. But ( 98 ) is another indication of the 
unexpected power of broadword methods. 

* Lower bounds. Indeed, the existence of so many tricks and techniques makes 
it natural to wonder whether we’ve only been scratching the surface. Are there 
many more incredibly fast methods，still waiting to be discovered? A few 
theoretical results are known by which we can derive certain limitations on what 
is possible, although such studies are still in their infancy. 

Let’s say that a 2-adic chain is a sequence (Xq ， Xi ， •… .x r ) of 2-adic integers 
in which each element Xi for i > 0 is obtained from its predecessors via bitwise 
manipulation. More precisely，we want the steps of the chain to be defined by 
binary operations 

工 i = ^j(i) Or X]^ Or Xj^ Ci , ( 99 ) 

where each is one of the operators {+， 一 ，&， |, ㊉ ， 三， C ， D ， C ， D ， 八， V, 《， 》} 
and each q is a constant. Furthermore, when the operator is a left shift or 
right shift，the amount of shift must be a positive integer constant; operations 
such as or Ci^>x^.^ are not permitted. (Without the latter restriction 

we couldn’t derive meaningful lower bounds, because every 0—1 valued function 
of a nonnegative integer x would be computable in two steps as “(c 》 x) & 1 ” 
for some constant c.) 

Similarly, a broadword chain of width n，also called an n-bit broadword 
chain，is a sequence {xq^Xi^ . .., x r ) of n-bit numbers subject to essentially the 
same restrictions, where n is a parameter and all operations are performed 
modulo 2 n • Broadword chains behave like 2-adic chains in many ways, but 
subtle differences can arise because of the information loss that occurs at the left 
of n-bit computations (see exercise 113). 

Both types of chains compute a function f(x) = x r when we start them 
out with a given value x = xq. Exercise 114 shows that an rrm-bit broadword 
chain is able to do m essentially simultaneous evaluations of any function that 
is computable with an n-bit chain. Our goal is to study the shortest chains that 
are able to evaluate a given function f. 


2-adic chain+H--h + 
broadword chain -]--\--\--\- 
branchless-|--1- + 
table lookup by shifting 
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Any 2-adic or broadword chain {xq^xi^ . .. . x r ) has a sequence of “shift sets” 
(*So, aSi, ..., S r ) and “bounds” (Bo, ••” B r ), defined as follows: Start with 
So = { 0 } and B。= 1 ; then for i > 1 , let 



f ⑷ U I 

r j(i) 5 

if = 

^j(i) ^k{i)' 




MiB k 

if Xi = 

c i °i x k(i) 5 

Si = < 


and Bi = < 


if Xi = 

(i) Of Cf ， 


^j(i) + c i^ 



if Xi = 

$ j(i)Ci ， 





if Xi = 

工 j ⑹ Ci ， 

where Mi = 2 if 

G {+，—} and Mi = 1 otherwise. 

， and these formulas 

that Oi ^ {《,》}. 

For example, consider the following 7-step chain: 


Xi 

5, 

B z 



Xo = X 

{ 0 } 

1 



xi = xo Sz —2 

{ 0 } 

1 



x 2 = xi + 2 

{ 0 } 

2 



^3 = ^2 > 1 

{1} 

2 



x 4: = x 2 + X 3 

{ 0 , 1 } 

8 



x 5 = x 4 > 4 

{4,5} 

8 



Xq i= X 4 . ~t~ X 5 

{0,1,4,5} 

128 



X 7 = x 6 > 4 

{4, 5, 8,9} 

128 



( 100 ) 


( 101 ) 


shift sets 
division, by 10 
monus 


(We encountered this chain in exercise 4.4—9, which proved that these operations 
will yield = L 尤 / 10 」 for 0 < x < 160 when performed with 8 -bit arithmetic.) 

To begin a theory of lower bounds ， let’s notice first that the high-order bits 
oi x = xq cannot influence any low-order bits unless we shift them to the right. 

Lemma A. Given a 2-adic or broadword chain, let the binary representation of 
Xi be (. .. Xi 2 XnXio )2 - Then bit Xi p can depend on bit Xq q only if q < p + maxS^ 

Proof. By induction on i we can in fact show that，if 氏 = 1 ， bit Xi p can depend 
on bit xo q only if g — p G 5^ Addition and subtraction，which force Bi > 1. 
allow any particular bit of their operands to affect all bits that lie to the left in 
the sum or difference, but not those that lie to the right. | 

Corollary I. The function x — 1 cannot be computed by a 2-adic chain，nor 
can any function for which at least one bit of f(x) depends on an unbounded 
number of bits of x. ■ 

Corollary W. An n-bit function f(x) can be computed by an n-bit broadword 
chain without shifts if and only if x = y (modulo 2 P ) implies f(x ) 三 f(y) 
(modulo 2 P ) for 0 < p < n. 

Proof. If there are no shifts we have Si = {0} for all i. Thus bit x rp cannot 
depend on bit xo q unless q < p. In other words we must have x r 三 y r (modulo 2 P ) 

whenever 三训 （modulo 2 P ). 

Conversely，all such functions are achievable by a sufficiently long chain. 
Exercise 119 gives shift-free n-bit chains for the functions 

fpy(x) = 2 P [x mod 2 P+1 = y ], when 0 < p < n and 0 < y < 2 P+1 , ( 102 ) 
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from which all the relevant functions arise by addition. [H. S. Warren, Jr” gener¬ 
alized this result to functions of m variables in CACM 20 (1977), 439-441.] | 

Shift sets Si and bounds Bi are important chiefly because of a fundamental 
lemma that is our principal tool for proving lower bounds: 

Lemma B. Let X pqr = {x r & —2 9 」| xq G V pqr } in an n-bit broadword chain, 

where 

V pqr = {x \ x \_2 P+S — 2 9+s J = 0 for all s G 5 r } ( 103 ) 

and p > q. Then |X pgr | < B r . (Here p and q are integers，possibly negative.) 

This lemma states that at most B r different bit patterns ^ r (p-i) - - - x rq can occur 
within f(x), when certain intervals of bits in x are constrained to be zero. 


Proof. The result certainly holds when r = 0. Otherwise if, for example, x r = 
Xj + Xk^ we know by induction that |X pg j| < Bj and < Bk. Furthermore 

Vp q r = Vpqj H V pq k : since S r = Sj U Sk. Thus at most BjBk possibilities for 
{xj + Xk) & [2 P — 2 9 」arise when there’s no carry into position q, and at most 
BjBk when there is a carry，making a grand total of at most B r = 2BjBk 
possibilities altogether. Exercise 122 considers the other cases. | 

We now can prove that the ruler function needs f2(loglogn) steps. 

Theorem R. If n = d • 2 d ， every n-bit broadword chain that computes px for 
0 < x < 2 n has more than lgd steps that are not shifts. 

Proof. If there are l nonshift steps，we have |*S r | < 2 l and B r < 2 2l ~ 1 . Apply 
Lemma B with p = d and q = 0, and suppose |X^or | = 2 d — t. Then there are t 
values oi k <2 d such that 


{ 2 & 2 &+ 2 以 2&+2 


• 2 d 




But Vdor excludes at most 2 l d of the n possible powers of 2; so ^ < 2 l . 

If l < lgd, Lemma B tells us that 2 d — t < B r < 2 d_1 ; hence 2 d ~ 1 < t < 
2 l < d. But this is impossible unless d < 2. when the theorem clearly holds. | 

The same proof works also for the binary logarithm function: 


Corollary L. If n = d ^2 d > 2. every n-bit broadword chain that computes Xx 
for 0 < x < 2 n has more than lgd steps that are not shifts. | 

By using Lemma B with q > 0 we can derive the stronger lower bound 
O(logn) for bit reversal，and hence for bit permutation in general. 


Theorem P. If 2 < g < n. every n-bit broadword chain that computes the 
g-bit reversal x R for 0 < x < 2 9 has at least lgg 」 steps that are not shifts. 

Proof. Assume as above that there are l nonshifts. Let h = and suppose 

that l < [lg(/i + 1)」. Then S r is a set of at most 2 l < |(/i + 1) shift amounts s. 
We shall apply Lemma B with p = q-\-h, where p < g and g > 0, thus in g-h-\-l 
cases altogether. The key observation is that x R Sz [2 P — 2 9 」 is independent of 
x & — whenever there are no indices j and k such that 0 < j,k <h 

and g — 1 — q — j = q-\- s k. The number of “bad” choices of q for which such 


Warren 

carry 

ruler function 
binary logarithm 
reversal 

bit permutation 
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indices exist is at most \{h l)h 2 < g — h] therefore at least one “good” choice 
of q yields |X pgr | = 2 h • But then Lemma B leads to a contradiction，because we 
obviously cannot have 2 h < B r < 2(" _1 )/ 2 . | 

Corollary M. Multiplication by certain constants, modulo 2 n . requires il(logn) 
steps in an n-bit broadword chain. 


Proof. In Hack 167 of the classic memorandum HAKMEM (M.I.T. A.I. Lab¬ 
oratory, 1972). Richard Schroeppel observed that the operations 


t ((ax) mod 2 n ) & 6 ， y ^ ((ct) mod 2 n ) 》 （n — g) ( 104 ) 

compute y = x R whenever n = g 2 and 0 < x < 2 9 ^ using the constants a = 
(2 n+ ^ - l)/(2^ +1 - 1)，6 = 29~ 1 (2 n - 1)/(2 〃 - 1), and c = (2" - 1)/(2^~ 1 -1). 
(See exercise 123.) | 

At this point the reader might well be thinking, “Okay, I agree that broad¬ 
word chains sometimes have to be asymptotically long. But programmers needn’t 
be shackled by such chains; we can use other techniques, like conditional branches 
or references to precomputed tables, which go beyond those restrictions.” 

Right. And we’re in luck, because broadword theory can also be extended 
to more general models of computation. Consider, for example, the follow¬ 
ing idealization of an abstract reduced-instruction-set computer，called a basic 
RAM: The machine has n-bit registers ri ， … ， r，，and n-bit memory words 
{M[0], …， M[2 m — 1]}. It can perform the instructions 


Vi ^ Vj ±r k , Vi rj o r k , m rj 》 r k ， me, 
Vi M[rj mod 2 m ]， M[rj mod 2 m ] 4 - 


( 10 5) 


where o is any bitwise Boolean operator, and where in the shift instruction is 
treated as a signed integer in two’s complement notation. The machine is also 
able to branch if Vi < treating Vi and rj as unsigned integers. Its state is the 
entire contents of all registers and memory, together with a “program counter” 
that points to the current instruction. Its program begins in a designated state, 
which may include precomputed tables in memory, and with an n-bit input 
value x in register ri. This initial state is called Q(x^ 0). and Q(x,t) denotes the 
state after t instructions have been performed. When the machine stops, will 
contain some n-bit value f(x). Given a function f(x), we want to find a lower 
bound on the least t such that r\ is equal to f(x) in state Q{x^ t)^ for 0 < x < 2 n . 


Theorem R/. Let e = 2_ e . A basic n-bit RAM with memory parameter m < 
n 1-6 requires at least lglgn — e steps to evaluate the ruler function px, as n oo. 

己 I G I f 

Proof. Let n = 2 2 , so that m < 2 2 _ 2 • Exercise 124 explains how an 

omniscient observer can construct a broadword chain from a certain class of 
inputs X, in such a way that each x causes the RAM to take the same branches, 
use the same shift amounts, and refer to the same memory locations. Our earlier 
methods can then be used to show that this chain has length > f. | 

A skeptical reader may still object that Theorem R/ has no practical value, 
because lglgn never exceeds 5 in the real world. To this argument there is no 
rebuttal. But the following result is slightly more relevant: 


HAKMEM 

Schroeppel 

abstract reduced-instruction-set computer 
basic RAM 

two’s complement notation 
program counter 
ruler function 
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Theorem P’. A basic n-bit RAM requires at least | lg ^ steps to compute the 
g-bit reversal x R for 0<x<2 9 . if g<n and 

max(m,l+lgn) < 叩 ^ 二 1 )」_ 2 , h= ( 106 ) 

Proof. An argument like the proof of Theorem R/ appears in exercise 125. | 

Lemma B and Theorems R ， P. R’ ， P / and their corollaries are due to 
A. Brodnik. P. B. Milter sen. and J. L Munro ， Lecture Notes in Comp. Sci. 
1272 (1997). 426-439, based on earlier work of Miltersen in Lecture Notes in 
Comp. Sci. 1099 (1996). 442-451. 

Many unsolved questions remain (see exercises 126-130). For example，does 
sideways addition require Q(logn) steps in an n - bit broadword chain? Can the 
parity function {vx) mod 2 ， or the majority function [vx) > n/2, be computed 
broadwordwise in O (log log n) steps — or even perhaps in constant time? 

An application to directed graphs. Now let’s use some of what we’ve learned, 
by implementing a simple algorithm. Given a digraph on a set of vertices V^ we 
write u ^ v when there’s an arc from u to v. The reachability problem is to 
find all vertices that lie on oriented paths beginning in a specified set Q CV; 

in other words, we seek the set 

/ 

R = \ u v for some u G Q }， ( 107 ) 

where u v means that there is a sequence of t arcs 

u = uq 4 ui 4 … 4 Ut = v ， for some t > 0. ( 108 ) 

This problem arises frequently in practice. For example, we encountered it in 
Section 2.3.5 when marking all elements of Lists that are not “garbage •” 

If the number of vertices is small, say \V\ < 64， we may want to approach 
the reachability problem in quite a different way than we did before，by working 
directly with subsets of vertices. Let 

S[u] = {v \ u ^ v} ( 109 ) 

be the set of successors of vertex for all u ^ V. Then the following algorithm 
is almost completely different from Algorithm 2.3.5E，yet it solves the same 
abstract problem: 

Algorithm R (Reachability). Given a directed graph，represented by the 
successor sets S[u] in ( 109 )，this algorithm computes the elements R that are 
reachable from a given set Q. 

Rl. [Initialize.] Set R ^ Q and X 卜 0. (In the following steps, X is the subset 
of vertices u ^ R for which we’ve looked at aS[w].) 

R2. [Done?] If X = _R，the algorithm terminates. 

R3. [Examine another vertex.] Let u be an element of R\X• Set X ^ X U {u}. 
R U and return to step R2. | 

The algorithm is correct because (i) every element placed into R is reachable; 

(ii) every reachable element Uj in ( 108 ) is present in R, by induction on j; and 

(iii) termination eventually occurs, because step R3 always increases \X . 


Brodnik 
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To implement Algorithm R we will assume that V = {0, 1 ， ...，n — 1}，with 
n < 64. The set X is conveniently represented by the integer cr(X) = 
u G X}, and the same convention works nicely for the other sets Q, R, and 
S[u]. Notice that the bits of S[0). S[l], …， *S[n — 1] are essentially the adjacency 
matrix of the given digraph，as explained in Section 7, but in little-endian order: 
The “diagonal” elements，which tell us whether or not u ^ S[u] : go from right to 
left. For example, if n = 3 and the arcs are {0—>0,0— >1， 1 — 0,2 — >0}，we have 
5[0] = (011)2 and S[l] = 5[2] = (001)2，while the adjacency matrix is ( 通 ) . 

Step R3 allows us to choose any element of R\X. so we use the ruler function 
u p(a(R) — cr(X)) to choose the smallest. The bitwise operations require no 
further trickery when we adapt the algorithm to MMIX: 


Program R (Reachability)• The input set Q is given in register q ， and each 
successor set S[u] appears in octabyte Mg [sue + Su]. The output set R will 
appear in register r; other registers s, t, tt ， u ， and x hold intermediate results. 


01 

1H 

SET 

r，q 


i 


Rl. Initialize, r cr(Q). 

02 


SET 

x，0 


i 


x •<— <j(0). 

03 


JMP 

2F 


i 


To R2. 

04 

3H 

SUBU 

tt ， 1 


R 


R3. Examine another vertex, tt 4— t — 1. 

05 


SADD 

u, tt 


R 


u p(t) [see ( 46 )]. 

06 


SLU 

s ,u,3 


R 


s 8u. 

07 


LDOU 

s,sue,s 


R 


S •<— (7(6^]). 

08 


ANDN 

tt,tt 


R 


tt 卜 t & 〜 tt = 2 U . 

09 


OR 

x,x,tt 


R 


X 4— X U {u}; that is, x x 1 2' since x = cr(X) 

10 


OR 

r ， r，s 


R 


R <— R\J S[u]: that is, r r | s, since r = a(R). 

11 

2H 

SUBU 

t ， r，x 

|i? 

+ 1 

R2. Done? t r — x = a(R \ X), since X C R. 

12 


PBNZ 

t ， 3B 

R 

+ 1 

ToR3 HR^X. 1 


The total running time is (/i + 9^) \R\ + 7v. By contrast，exercise 131 imple¬ 
ments Algorithm R with linked lists; the overall execution time then grows to 

(35 + 4|i?| - 2|Q| + 1)// + (55 + 12\R\ - 5|Q| +4)t;, where S = I^MI- (But 

of course that program is also able to handle graphs with millions of vertices.) 

Exercise 132 presents another instructive algorithm where bitwise operations 
work nicely on not - too-large graphs. 

Application to data representation. Computers are binary, but (alas?) 
the world isn’t. We often must find a way to encode nonbinary data into 0s 
and Is. One of the most common problems of this sort is to choose an efficient 
representation for items that can be in exactly three different states. 

Suppose we know that x G {a, c]. and we want to represent x by two 

bits X[X r . We could, for example, map a ^ 00 ， 6 4 01， and c ^ 10. But there 
are many other possibilities — in fact，4 choices for a. then 3 choices for 6 ， and 
2 for c. making 24 altogether. Some of these mappings might be much easier to 
deal with than others, depending on what we want to do with x. 

Given two elements x,y ^ {a. b : c}, we typically want to compute z = x o y, 
for some binary operation o. If x = x\x T and y = yiy r then z = ziz r , where 

zi = fi{xi,x r ,yi,y r ) and z r = f r (x h x r ,yi,y r )； (no) 


adjacency matrix 
little-endian order 
ruler function 
encoding of ternary data 

representing three states with two bits+ + + 
mapping three items into two-bit codes+ + -( 
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these Boolean functions // and f r of four variables depend on o and the chosen 
representation. We seek a representation that makes // and f r easy to compute. 

Suppose，for example, that {a ， 6 ， c} = { — 1 ， 0 ， + 1 } and that o is multiplica¬ 
tion. If we decide to use the natural mapping x x mod 3 , namely 

0 I ~y 00, +11 ~y 01, — 11 ~y 10 

so that x = x r — xi : then the truth tables for // and f r are respectively 

fi -f->- 000*001*010***** and f r 000*010*001*****. (112) 

(There are seven “don’t - cares，” for cases where x\x T = 11 and/or yiy r = 11 .) 
The methods of Section 7 . 1.2 tell us how to compute z\ and z r optimally，namely 

A = (❿ © "，） 八 （av © Vr )： z r = ㊉ y r ) 八 （ x r ㊉ yQ' (113) 

unfortunately the functions // and f r in (112) are independent, in the sense that 
they cannot both be evaluated in fewer than C(fi) + C(f r ) = 6 steps. 

On the other hand the somewhat less natural mapping scheme 

+14 00, 0 ^ 01, -1 ^ 10 

leads to the transformation functions 

fi 001*000*100***** and f r 010*111*010 *****， (115) 

and three operations now suffice to do the desired evaluation: 

Zi = x r \/ y T , ;= (心 © 奶） 八习. (n6) 

Is there an easy way to discover such improvements? Fortunately we don’t 
need to try all 24 possibilities，because many of them are basically alike. For 
example, the mapping x x T x\ is equivalent to x ^ x\x T ^ because the new 
representation x[x f r = x r x t obtained by swapping coordinates makes 

fl ^ x r • Vh Vr) = z l = = fr{ x l x rVl Vr\i 

the new transformation functions // and defined by 
f[{x u x r ,y u y r ) = f r (x r ， x h y r ， yi) ， f r (x u x r ,y u y r ) = fi(x r ,x h y r ,yi) (117) 

have the same complexity as fi and f r . Similarly we can complement a coordi¬ 
nate, letting x\x ! r = x l x r ] then the transformation functions turn out to be 

fli x l ， x r ， yi ， yr) = fl{ x l ， x r ， yif r {xi ， X r ， yi ， y r 、= fr( x l ， x r ， yi ， yr) ， ( 1:L 8) 

and again the complexity is essentially unchanged. 

Repeated use of swapping and / or complementation leads to eight mappings 
that are equivalent to any given one. So the 24 possibilities reduce to only three, 
which we shall call classes I ， II, and III: 


Class I Class II Class III 



a ^ 00 01 10 11 00 10 01 11 00 01 10 11 00 10 01 11 00 01 10 11 00 10 01 11; 

6 ^ 01 00 11 10 10 00 11 01 01 00 11 10 10 00 11 01 11 10 01 00 11 01 10 00; (119) 

c 4 10 11 00 01 01 11 00 10 11 10 01 00 11 01 10 00 01 00 11 10 10 00 11 01. 


(n4 


in 


multiplication of signed bits+ 
signed bits, representation of 
don’t-cares 
2-cube equivalence 



30 


COMBINATORIAL ALGORITHMS (F1A) 


7.1.3 


To choose a representation we need consider only one representative of each 
class. For example, if a = +1 ， b = 0, and c = —1. representation (m) belongs 
to class II. and ( 114 ) belongs to class I. Class III turns out to have cost 3， like 
class I. So it appears that representation ( 114 ) is as good as any，with z computed 
by ( 116 )，for the 3-element multiplication problem we’ve been studying. 

Appearances can, however, be deceiving, because we need not map {a. c} 
into unique two-bit codes. Consider the one-to-many mapping 

+1 4 00, 0 i—^ 01 or 11, —1 4 10, ( 120 ) 

where both 01 and 11 are allowed as representations of zero. The truth tables 
for // and f r are now quite different from ( 112 ) and ( 115 )， because all inputs are 
legal but some outputs can be arbitrary: 

fi ^ 0*1 *****1*0***** and f r 0101111101011111. ( 121 ) 

And in fact, this approach needs just two operations，instead of the three in ( 116 ): 

A = ❿ ㊉ 切， z r = x r W y r . ( 122 ) 

A moment’s thought shows that indeed, these operations obviously yield the 
product z = x-y when the three elements {+ 1 . 0 , — 1 } are represented as in ( 120 ). 

Such nonunique mappings add 36 more possibilities to the 24 that we had 
before. But again, they reduce under “2-cube equivalence” to a small number of 
equivalence classes. First there are three classes that we call IV a ， IVb ， and IV C , 
depending on which element has an ambiguous representation: 

Class IV a Class IV 6 Class IV C 

/- A - s /- A - s /- A - s 

a ^ 0* 0* 1* 1* *0 *0 *1 *1 11 10 01 00 11 01 10 00 10 11 00 01 01 11 00 10; 

6 ^ 10 11 00 01 01 11 00 10 0* 0* 1* 1* *0 *0 *1 *1 11 10 01 00 11 01 10 00; (123) 

c 4 11 10 01 00 11 01 10 00 10 11 00 01 01 11 00 10 0* 0* 1* 1* *0 *0 *1 *1. 


one-to-many mapping 
don’t-cares 
2-cube equivalence 
don’t-cares 


(Representation (120) belongs to Class IV5. Classes IV a and IV C don’t work well 
for z = x.y.) Then there are three further classes with only four mappings each: 


a tt 

b ^ 01 

c ^ 10 


Class V a 

N 


Class V5 

s 


Class V c 

s 

tt 

tt 

tt 

10 

11 

00 

01 

01 

00 

11 

10: 

J 

00 

11 

10 

tt 

tt 

tt 

tt 

10 

11 

00 

01; 

/ 

11 

00 

01 

01 

00 

11 

10 

tt 

tt 

tt 

tt. 


(124) 


These classes are a bit of a nuisance, because the indeterminacy in their truth 
tables cannot be expressed simply in terms of don’t-cares as we did in (121). For 
example, if we try 

+1 4 00 or 11, 0 01. —1 4 10, (125) 

which is the first mapping in Class V a ，there are binary variables pqrst such that 
fi ^ pOlgOOOOlOrlsOlt and f r ^ plOqllllOlrOslOt. (126) 
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Furthermore, mappings of classes V a , V&，and V c almost never turn out to 
be better than the mappings of the other six classes (see exercise 138). Still, 
representatives of all nine classes must be examined before we can be sure that 
an optimal mapping has been found. 

In practice we often want to perform several different operations on ternary¬ 
valued variables，not just a single operation like multiplication. For example，we 
might want to compute max(x, y) as well as x-y. With representation ( 120 )，the 
best we can do is zi = xi /\ yi^ z r = (xi 八沒 r ) V (x r A (yi V y r ))： but the “natural” 
mapping ( 111 ) now shines, with z\ = xi /\ yi ， z r = x r \/ y r . Class III turns out 
to have cost 4; other classes are inferior. To choose between classes II ， III， and 
IV 5 in this case，we need to know the relative frequencies oi x^y and max(x, y). 
And if we add to the mix，classes II ， III， and IV 5 compute it with the 

respective costs 2, 5, 5; hence ( 111 ) looks better yet. 

The ternary max and min operations arise also in other contexts, such as the 
three-valued logic developed by Jan Lukasiewicz in 1917. [See his Selected Works^ 
edited by L. Borkowski (1970) ， 84-88 ， 153-178.] Consider the logical values 

“true,” “false,” and “maybe，” denoted respectively by 1. 0. and *. Lukasiewicz 
defined the three basic operations of conjunction ， disjunction，and implication 
on these values by specifying the tables 



x 



0 0 0 
0 * * 
0*1 


x Ay 



x\J y 



( 12 7 ) 
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For these operations the methods above show that the binary representation 

0 4 00 , * ^ 01 , 1 ^ 11 

works well, because we can compute the logical operations thus: 

xix r A yiy r = (xiAyi)(x r Ay r ), x t x r V yiy r = (x/Vy/)(x r Vy r ), 

x\x r yiyr = ((^V ㊉ ？ /r) A A Wr)) ( 尤 z A y r ). 

Of course x need not be an isolated ternary value in this discussion; we often 
want to deal with ternary vectors x = X\X 2 .. . x n： where each Xj is either a, b. 
or c. Such ternary vectors are conveniently represented by two binary vectors 

~ *^11*^21 • • • 工 nl and X r = Xi r X2r • • • ^nr i (l3^) 

where Xj ^ XjiXj r as above. We could also pack the ternary values into two-bit 
fields of a single vector, 


(128) 

( 12 9) 


X = X\iX\ r X2lX2r - - - ^nl^nr ] 


031) 


that would work fine if ， say, we’re doing Lukasiewicz logic with the operations A 
and V but not Usually, however. the two-vector approach of ( 130 ) is better, 
because it lets us do bitwise calculations without shifting and masking. 
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Applications to data structures. Bitwise operations offer many efficient ways 
to represent elements of data and the relationships between them. For example, 
chess-playing programs often use a “bit board” to represent the positions of 
pieces (see exercise 143). 

In Chapter 8 we shall discuss an important data structure developed by 
Peter van Emde Boas for representing a dynamically changing subset of integers 
between 0 and N. Insertions ， deletions，and other operations such as “find the 
largest element less than x” can be done in O(loglog N) steps with his methods; 
the general idea is to organize the full structure recursively as ^/N substructures 
for subsets of intervals of size . together with an auxiliary structure that 
tells which of those intervals are occupied. [See Information Processing Letters 
6 (1977) ， 80-82; also P. van Emde Boas, R. Kaas，and E. Zijlstra ， Math. Systems 
Theory 10 (1977). 99-127.] Bitwise operations make those computations fast. 

Hierarchical data can sometimes be arranged so that the links between 
elements are implicit rather than explicit. For example, we studied “heaps” 
in Section 5.2.3, where n elements of a sequential array implicitly have a binary 
tree structure like 



when, say，n = 10. (Node numbers are shown here both in decimal and binary 
notation.) There is no need to store pointers in memory to relate node j of a 
heap to its parent (which is node j 》 1 if j # 1 )，or to its sibling (which is node 
j •㊉ 1 if j # 1 )，or to its children (which are nodes j . 《 1 and (j 《 1 ) + 1 if those 
numbers don’t exceed n)，because a simple calculation leads directly from j to 
any desired neighbor. 

Similarly, a sideways heap provides implicit links for another useful family 
of n-node binary tree structures，typified by 




CQ1Q0 

(ooTo) (OTTO) 


1100 


ctotoT 


( 工 33) 


when n = 10. (We sometimes need to go beyond n when moving from a node to 
its parent, as in the path from 10 to 12 to 8 shown here.) Heaps and sideways 
heaps can both be regarded as nodes 1 to n of infinite binary tree structures: 
The heap with n = oo is rooted at node 1 and has no leaves; by contrast，the 
sideways heap with n = oo has infinitely many leaves 1 ， 3, 5， .. . . but no root(!). 

The leaves of a sideways heap are the odd numbers, and their parents are the 
odd multiples of 2. The grandparents of leaves ， similarly, are the odd multiples 
of 4; and so on. Thus the ruler function pj tells how high node j is above leaf level. 

The parent of node j in the infinite sideways heap is easily seen to be node 

(j - k) I (A:< 1 ), where k = jSz-j; ( 134 ) 
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inorder 
Harel 

lowest common ancestor, see Nearest comm« 

A 

Harel 

Tar j an 

Schieber 

Vishkin 

oriented forest 

acyclic digraph 

ancestor 

reachability 

transitive closure 

nearest common ancestor 

preorder H --\- 


pl = max{px \ i <x < j} = X(j & -i), ( 138 ) 

which relates the p and A functions. (See exercise 146.) We can therefore use 
formula ( 137 ) with h = A(j & —i) to calculate l. 

Subtle extensions of this approach lead to an asymptotically efficient algo¬ 
rithm that finds nearest common ancestors in any oriented forest whose arcs 

grow dynamically [D. Harel and R. E. Tar j an, SICOMP 13 (1984), 338-355]. 
Baruch Schieber and Uzi Vishkin [SICOMP 17 (1988). 1253-1262] subsequently 

discovered a much simpler way to compute nearest common ancestors in an 
arbitrary (but fixed) oriented forest, using an attractive and instructive blend of 
bitwise and algorithmic techniques that we shall consider next. 

Recall that an oriented forest with m trees and n vertices is an acyclic 
digraph with n — m arcs. There is at most one arc from each vertex; the vertices 
with out-degree zero are the roots of the trees. We say that v is the parent of u 
when u ^ v, and v is an ancestor of u when u v. Two vertices have a 
common ancestor if and only if they belong to the same tree. Vertex w is called 
the nearest common ancestor of u and v when we have 

u —z and v —>* 2 ： if and only if w z. ( 139 ) 

Schieber and Vishkin preprocess the given forest, mapping its vertices into 
a sideways heap S of size n by computing three quantities for each vertex v: 

ttu, the rank of v in preorder (1 < t^v < n); 

/3v ， a node of the sideways heap S (1 < (5v < n); 

av^ a (1 + An)-bit routing code (1 < av < 2 1+An ). 

li u v have ttw > ttv by the definition of preorder. Node pv is defined to 
be the nearest common ancestor of all sideways-heap nodes ttu such that v is an 
ancestor of vertex u. And we define 

av 二 {2 P ^ W I v —it;}. (140) 


this quantity is j rounded to the nearest multiple of 2 1+PJ . And the children are 

j - (k 》 1) and j + (A: 》 1 ) ( 135 ) 

when j is even. In general the descendants of node j form a closed interval 

[j-W + l … j + W -休 (i 3 6) 

arranged as a complete binary tree of 2 1+PJ nodes. The ancestor of node j at 
height h is node 

(j I (1 《 "))& -(1 《 h) = ((j >/i) I 1) </i (i37) 

when h > pj. Notice that the symmetric order of the nodes, also called inorder, 
is just the natural order 1 ， 2 ， 3，•… 

Dov Harel noted these properties in his Ph.D. thesis (U. of California. Irvine ， 
1980)，and observed that the nearest common ancestor of any two nodes of a 
sideways heap can also be easily calculated. Indeed，if node l is the nearest 
common ancestor of nodes i and j. where i < there is a remarkable identity 
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For example, here’s an oriented forest with ten vertices and two trees: 


7 

Each node has been labeled with its preorder rank, from which we can compute 
the (3 and a codes: 

v=ABCDEFGH I J 
7TV = 0001 1000 0010 0100 1001 0011 0101 0111 1010 0110 

f3v = 0100 1000 0010 0100 1010 0011 0110 0111 1010 0110 

av = 0100 1000 0110 0100 1010 0111 0110 0101 1010 0110 

Notice that, for instance. [3A = 4 = 0100 because the preorder ranks of the 

descendants of A are {1 ， 2,3,4, 5, 6, 7}. And aH = 0101 because the ancestors 
of H have f3 codes {/3H，/3D，/3A} = {0111,0100}. One can prove without 

difficulty that the mapping v /3v satisfies the following key properties: 

i) If u ^ v in the forest, then (5u is a descendant of 如 in 5. 

ii) If several vertices have the same value of /3v, they form a path in the forest. 

Property (ii) holds because exactly one child u of v has pu = f3v when f3v ^ tyv. 
Now let’s imagine placing every vertex v of the forest into node 如 of 5: 


If k vertices map into node j、we can arrange them into a path 

v 0 ^ v 1 外 —i 4 V k , where /3v 0 = pv x = • • • = f3v k _i = j. ( 143 ) 

These paths are illustrated in ( 142 ); for example, J G D is ^ path in ( 141 )， 
and c appears with node 0110 = /3J = /3G. 

The preprocessing algorithm also computes a table rj for all nodes j of S : 
containing pointers to the vertices Vk at the tail ends of ( 143 ): 

j = 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 

rj=AACAADDAAB 

Exercise 149 shows that all four tables nv. /3v, av^ and rj can be prepared in 
0{n) steps. And once those tables are ready，they contain just enough informa¬ 
tion to identify the nearest common ancestor of any two given vertices quickly: 

Algorithm V (Nearest common ancestors). Suppose ttv : /3v, av^ and rj are 
known for all n vertices v of an oriented forest, and for 1 < j < n. A dummy 
vertex A is also assumed to be present，with 7 rA = /3A = aA = 0. This algorithm 
computes the nearest common ancestor z of any given vertices x and y. returning 
z = A if x and y belong to different trees. We assume that the values Xj = Llgj 」 
have been precomputed for 1 < j < n, and that A0 = An. 


(142) 
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VI. [Find common height.] If /3x < /3y : set h 4 — X(/3y & —px)] otherwise set 


h 4 - X(/3x & —/3y). (See ( 138 ).) 


V2. [Find true height.] Set k <— ax & ay & — (1 《 /i)，then h i— X(k & —k). 

V3. [Find pz] Set j ((/3x 》 /i) | 1) 《 ".(Now j = f3z, if z # 八 .） 

V4. [Find x and y.] (We now seek the lowest ancestors of x and y in node j.) 
If j = /3x. set x = x] otherwise set l ^ X(ax & ((1 《 h) — 1)) and x = 
丁 》 /) I 1) 《 /). Similarly, if j. = /3y, set y = otherwise set l 
入 & ((1 < /i) - 1)) and y = r(((/3y > /) | 1) <C l)- 

V5. [Find z.] Set z i— x if ixx < iry, otherwise z y. | 

These artful dodges obviously exploit ( 137 ); exercise 152 explains why they work. 


Sideways heaps can also be used to implement an interesting type of priority 
queue that J. Katajainen and F. Vitale call a “navigation pile，” illustrated here 


for n = 10 : 



3 5 7 9 11 13 15 17 19 


(M4) 


Data elements go into the leaf positions 1 ， 3 ， … ， 2n — 1 of the sideways heap; 
they can be many bits wide, and they can appear in any order. By contrast，each 
branch position 2, 4, 6 ，… contains a pointer to its largest descendant. And the 
novel point is that these pointers take up almost no extra space — fewer than two 
bits per item of data，on average — because only one bit is needed for pointers 2 , 
6 ， 10，…， only two bits for pointers 4 ， 12 ， 20，…， and only pj for pointer j in 
general. (See exercise 153.) Thus the navigation pile requires very little memory, 
and it behaves nicely with respect to cache performance on a typical computer. 


R s T 




Fig. 13. Two views of five lines 
in the hyperbolic plane. 


*Cells in the hyperbolic plane. Hyperbolic geometry suggests an instructive 
implicit data structure that has a rather different flavor. The hyperbolic plane is 
a fascinating example of non-Euclidean geometry that is conveniently viewed by 
projecting its points into the interior of a circle. Its straight lines then become 
circular arcs, which meet the rim at right angles. For example, the lines PP r , 
QQ f , and RR f in Fig. 13 intersect at points (9 ， A ， B, and those points form a 
triangle. Lines SQ r and QQ’ are parallel: They never touch, but their points 
get closer and closer together. Line QT is also parallel to QQ f • 
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We get different views by focusing on different center points. For example, 
the second view in Fig. 13 puts O smack in the center. Notice that if a line passes 
through the very center, it remains straight after being projected; such diameter- 
spanning chords are the special case of a “circular arc” whose radius is infinite. 

Most of Euclid’s axioms for plane geometry remain valid in the hyperbolic 
plane. For example, exactly one line passes through any two distinct points; and 
if point A lies on line PP’ there’s exactly one line QQ’ such that angle PAQ has 
any given value 6^ for 0 < ^ < 180°. But Euclid’s famous fifth postulate does not 
hold: If point C is not on line QQ f ， there always are exactly two lines through C 
that are parallel to QQ f • Furthermore there are many pairs of lines, like RR’ 
and SQ f in Fig. 13， that are totally disjoint or ultraparallel, in the sense that 
their points never become arbitrarily close. [These properties of the hyperbolic 
plane were discovered by G. Saccheri in the early 1700s, and made rigorous by 
N. I. Lobachevsky, J. Bolyai, and C. F. Gauss a century later. 

Quantitatively speaking，when points are projected onto the unit disk |z| < 1 ， 
the arc that meets the circle at e l ° and e~ t0 has center at sec 6 and radius 
tan 6. The actual distance between two points whose projections are z and z r is 
c/(z, z') = ln(|l — zz'\ \z — z f \) — ln(|l — zz'\ — \z — z'\). Thus objects far from 
the center appear dramatically shrunken when we see them near the circle’s rim. 

The sum of the angles of a hyperbolic triangle is always less than 180°. For 
example, the angles oX O. and B in Fig. 13 are respectively 90°, 45°, and 36°. 
Ten such 36°-45°-90° triangles can be placed together to make a regular pentagon 
with 90° angles at each corner. And four such pentagons fit snugly together at 
their corners, allowing us to tile the entire hyperbolic plane with right regular 
pentagons (see Fig. 14). The edges of these pentagons form an interesting family 
of lines, every two of which are either ultraparallel or perpendicular; so we have 
a grid structure analogous to the unit squares of the ordinary plane. We call it 
the pentagrid ， because each cell now has five neighbors instead of four. 

There’s a nice way to navigate in the pentagrid using Fibonacci numbers, 
based on ideas of Maurice Margenstern [see F. Herrmann and M. Margenstern. 

Theoretical Comp. Sci. 296 (2003), 345-351]. Instead of the ordinary Fibonacci 

sequence (F n ) : however, we shall use the negaFibonacci numbers (F^ n )^ namely 

F-i = 1 , F-2 = - 1 , F_ 3 = 2 , F _ 4 = - 3 , … ， F_ n = (~l) n ~ 1 F n . (145) 

Exercise 1.2.8-34 introduced the Fibonacci number system, in which every non¬ 
negative integer x can be written uniquely in the form 

x = F kl - Fk 2 H - h F kr , where ki k 2 yp- »- k r >>- 0; ( 146 ) 

here c j k : means c j > k-\-2\ But there’s also a negaFibonacci number system. 

which suits our purposes better: Every integer x. whether positive, negative, or 

zero ， can be written uniquely in the form 

x = F kl - F k2 H - h F kr ^ where -« k 2 - - -« k r -« 1. ( 147 ) 

For example, 4 = 5 — 1 = F _ 5 + F_ 2 and —2 = —3 + 1 = F_ 4 + jP_i. This 
representation can conveniently be expressed as a binary code a = . .. a^a 2 di ， 
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Fig. 14. The pentagrid ， 
in which identical pentagons 
tile the hyperbolic plane. 


/A circular regular tiling , confined on all sides 


by infinitely small shapes , is really wonderful . 


— M. C. ESCHER, letter to George Escher (9 November 1958) 


negadecimal system 

2-adic 

magic mask 


standing for N(a) = akF_k' with no two Is in a row. For example, here are 
the negaFibonacci representation codes of all integers between —14 and +15: 

-14 = 100101 ⑻ -8 = 10 ⑻⑻ -2 = 1001 4 = 10010 10 = 1001000 

-13 = 10010101 -7 = 100001 -1 = 10 5 = 10000 11 = 1001001 

-12 = 101010 -6 = 100100 0 = 0 6 = 10001 12 = 1000010 

-11 = 101000 -5 = 100101 1 = 1 7= 10100 13 = 10 ⑻⑻ 0 

-10 = 101001 -4 = 1010 2 = 100 8 = 10101 14 = 1000001 

-9 = 100010 -3 = 1000 3 = 101 9 = 1001010 15 = 1000100 

As in the negadecimal system (see 4.1-(6) and ( 7 ))，we can tell whether x is 
negative or not by seeing if its representation has an even or odd number of digits. 

The predecessor a— and successor a-\- of any negaFibonacci binary code a 
can be computed recursively by using the rules 

(a01)— = a00, (a000)— = a010, (al00)— = a001, (alO)— = (a—)01 ， 

(o; 10 ) + = a 00 , (a 00 )+ = a 01 , (al)+ = (a—) 0 . ( 148 ) 

(See exercise 154.) But ten elegant 2-adic steps do the calculation directly: 

y t © 只 0 ， z y ㊉ (y 士 1 )， where x = (a ) 2 ； 

z i z I (x & l)); ( 149 ) 

w l X © z ㊉ ((z + 1 ) 》 2 ); then w = (a±) 2 . 

We just use y — 1 in the top line to get the predecessor, y + 1 to get the successor. 
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And now here’s the point: A negaFibonacci code can be assigned to each 
cell of the pentagrid in such a way that the codes of its five neighbors are easy to 
compute. Let’s call the neighbors n, e. w, and o. for “north，” “south，” “east,” 
“west,” and “other.” If a is the code assigned to a given cell, we define 

Qn = Q 2 ， CKs = Q 2 ， Oi e = Q^ +， ~ — ; (I。。) 

thus a sn = and also a en = (a01) n = a. The “other” direction is trickier: 



if a & 1 = 1 ; 
if a&l = 0 . 


051 ) 


For example, 1000 o = 101001 and 101001 o = 1000. This mysterious interloper 

lies between north and east when a ends with 1 , but between north and west 

/ 

when a ends with 0 . 

If we choose any cell and label it with code 0. and if we also choose an 
orientation so that its neighbors are n ， e ， s ， w. and o in clockwise order，rules 
( 150 ) and ( 151 ) will assign consistent labels to every cell of the pentagrid. (See 
exercise 160.) For example, the vicinity of a cell labeled 1000 will look like this: 



The code labels do not ， however. identify cells uniquely, because infinitely 
many cells receive the same label. (Indeed, we clearly have 0 n = 0 5 = 0 and 
l w = 1 Q = 1 .) To get a unique identifier，we attach a second coordinate so that 
each cell’s full name has the form (a. y)^ where y is an integer. When y is constant 
and a ranges over all negaFibonacci codes, the cells (a. y) form a more - or-less 
hook - shaped strip whose edges take a 90° turn next to cell (0, y). In general, the 
five neighbors of (a,y) are (a ， y) n = (a n ,y + S n (a)), (a ， y) s = (a s ,y S s (a)), 
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， y) e = + 心 (Q) ) ， V)w = ， 2/+ 〜;(Q) ) ， and (Q ^， y) 0 = (CK 0 ， y (Q；)) ， 

where 


S n (a) = 卜 = 0], ^ 5 (a) = — [a = 0], S e (a) = 0. S w (a) = — [a = 1]; 


S 0 (^)= 


J sign(a 0 - a n )[«o & a n = 0], 
\ sign.(Q 0 _ cx w ) [o^o ^ = 0]• 


if a & 1 = 1: 

/ 

if a & 1 = 0. 


( 153 ) 


(See the illustration below.) Bitwise operations now allow us to surf the entire 
hyperbolic plane with ease. On the other hand, we could also ignore the y 
coordinates as we move, thereby wrapping around a “hyperbolic cylinder” of 
pentagons; the a coordinates define an interesting multigraph on the set of all 
negaFibonacci codes, in which every vertex has degree 5. 
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Bitmap graphics. It’s fun to write programs that deal with pictures and shapes ， 
because they involve our left and right brains simultaneously. When image data 
is involved，the results can be engrossing even if there are bugs in our code. 

The book you are now reading was typeset by software that treated each 
page as a gigantic matrix of 0s and Is, called a “raster” or “bitmap,” containing 
millions of square picture elements called “pixels •” The rasters were transmitted 
to printing machines, causing tiny dots of ink to be placed wherever a 1 appeared 
in the matrix. Physical properties of ink and paper caused those small clusters 
of dots to look like smooth curves; but each pixel’s basic squareness becomes 
evident if we enlarge the images tenfold，as in the letter C A’ shown in Fig. 15(a). 

With bitwise operations we can achieve special effects like “custering，” in 
which the black pixels disappear when they are surrounded on all sides: 


Fig. 15. The letter A. 
before and after custering. 
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This operation，introduced by R. A. Kirsch. L. Cahn ， C. Ray, and G. E. Urban 
Proc. Eastern Joint Computer Conf. 12 (1957), 221-229], can be expressed as 

custer(X) = X & 〜 ((X V1) & (X 》 1) & (X 《 1) & (义汽 1 ))， （丄 55) 

where and X^\V stand respectively for the result of shifting the bitmap X 

down or up by one row. Let us write 

X N = XV 1 , x w = X > 1 , X E = X< 1 , X s = ( 156 ) 

for the 1-pixel shifts of a bitmap X• Then，for example, the symbolic expression 
C X N & (X s I X E ) ’ evaluates to 1 in those pixel positions whose northern neighbor 
is black, and which also have either a black neighbor on the south side or a white 
neighbor to the east. With these abbreviations ， ( 155 ) takes the form 

custer(X) = X & -(X N & X w & X E & X s ), ( 157 ) 

which can also be expressed as X & (X N | X w \ X K \ X s ). 

Every pixel has four “rook-neighbors,” with which it shares an edge at the 
top ， left ， right, or bottom. It also has eight “king-neighbors，” with which it 
shares at least one corner point. For example, the king - neighbors that lie to the 
northeast of all pixels in a bitmap X can be denoted by X NE , which is equivalent 
to (X N ) E in pixel algebra. Notice that we also have X NE = (X E ) N . 

A 3 x 3 cellular automaton is an array of pixels that changes dynamically 
via a sequence of local transformations，all performed simultaneously: The state 
of each pixel at time t + 1 depends entirely on its state at time t and the states 
of its king-neighbors at that time. Thus the automaton defines a sequence of 
bitmaps X( 2 ), •… that lead from any given initial state X( 0 \ where 

Y " (亡 +1) — £( y ⑴ y ⑴ Y " ⑴ Y ⑴ Y ⑴ Y " ⑴ Y " ⑴ Y ⑴、 (-\ 

A — J(A N w ， A N ， A ne ， A w , A e ， A sw ， A s ， A se J 

and f is any bitwise Boolean function of nine variables. Fascinating patterns 
often emerge in this way. For example，after Martin Gardner introduced John 
Conway’s game of Life to the world in 1970， more computer time was probably 
devoted to studying its implications than to any other computational task during 
the next several years — although the people paying the computer bills were 
rarely told! (See exercise 167.) 

There are 2 512 Boolean functions of nine variables, so there are 2 512 different 
3x3 cellular automata. Many of them are trivial，but most of them probably 
have such complicated behavior that they are humanly impossible to understand. 
Fortunately there also are many cases that do turn out to be useful in practice — 
and much easier to justify on economic grounds than the simulation of a game. 

For example，algorithms for recognizing alphabetic characters ， fingerprints ， 
or similar patterns often make use of a “thinning” process，which removes excess 
black pixels and reduces each component of the image to an underlying skeleton 
that is comparatively simple to analyze. Several authors have proposed cellular 
automata for this problem，beginning with D. Rutovitz [J. Royal Stat. Society 
A129 (1966) ， 512-513] who suggested a 4 x 4 scheme. But parallel algorithms 
are notoriously subtle，and flaws tended to turn up after various methods had 
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Fig. 16. Example 
results of Guo and 
Hall’s 3x3 autom¬ 
aton for thinning 
the components of a 
bitmap. (“Hollow” 
pixels were origi¬ 
nally black.) 


Guo 

Hall 

connectivity structure 
kingwise connected 
rookwise connected 
Rosenfeld 


been published. For example, at least two of the black pixels in a component like 
稱 should be removed, yet a symmetrical scheme will erroneously erase all four. 
A satisfactory solution to the thinning problem was finally found by Z. Guo 

and R. W. Hall [CACM 32 (1989), 359-373, 759], using a 3 x 3 automaton that 

invokes alternate rules on odd and even steps. Consider the function 

/( 尤 NW ， 尤 N ，$ NE ， 尤 W ， 尤，尤 E ， 尤 SW ，$ S ， 尤 SE ) = ^( X N W ， …，尤 W ，尤 E ，…，尤 SE )， （工59) 

where g = 1 only in the following 37 configurations surrounding a black pixel: 

团圈圍团圈团圍圈 ■圚 圉圍困 ■ ■囿 翮翮翮■■團 

Then we use ( 158 ), but with /(x NW ， x N ， x NE ， x w ， x ， x E ， x sw ， x s ， x SE ) replaced by 
its 180° rotation /(x SE , x sw ^ x E ^ x w ^ x NE . x N ^ x NW ) on even-numbered steps. 
The process stops when two consecutive cycles make no change. 

With this rule Guo and Hall proved that the 3x3 automaton will preserve 
the connectivity structure of the image, in a strong sense that we will discuss 
below. Furthermore their algorithm obviously leaves an image intact if it is 
already so thin that it contains no three pixels that are king-neighbors of each 
other. On the other hand it usually succeeds in “removing the meat off the 
bones” of each black component，as shown in Fig. 16. Slightly thinner thinning 
is obtained in certain cases if we add four additional configurations 

n b n e (160) 

to the 37 listed above. In either case the function g can be evaluated with a 
Boolean chain of length 25. (See exercises 170-172.) 

In general, the black pixels of an image can be grouped into segments or 
components that are kingwise connected, in the sense that any black pixel can 
be reached from any other pixel of its component by a sequence of king moves 
through black pixels. The white pixels also form components, which are rookwise 
connected: Any two white cells of a component are mutually reachable via rook 
moves that touch nothing black. It’s best to use different kinds of connectedness 
for white and black，in order to preserve the topological concepts of “inside” and 
“outside” that are familiar from continuous geometry [see A. Rosenfeld, JACM 
17 (1970), 146-160]. If we imagine that the corner points of a raster are black, 
an infinitely thin black curve can cross between pixels at a corner，but a white 
curve cannot. (We could also imagine white corner points，which would lead to 
rookwise connectivity for black and kingwise connectivity for white.) 
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Fig. 17. The shrinking of a Cheshire cat 
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An amusing algorithm for shrinking a picture while preserving its connec¬ 
tivity, except that isolated black or white pixels disappear，was presented by 
S. Levialdi in CACM 15 (1972), 7—10; an equivalent algorithm, but with black 
and white reversed，had also appeared in T. Beyer’s Ph.D. thesis (M.I.T. ， 1969). 
The idea is to use a cellular automaton with the simple transition function 

f (^NW ， $N ， $NE ， $E ， X SW^ X S ^ ^Se) = {x A (x w VX sw VX s ) ) V (x w AX s ) (l 6 l) 

at each step. This formula is actually a 2 x 2 rule, but we still need a 3 x 3 window 
if we want to keep track of the cases when a one-pixel component goes away. 

For example，the 25 x 30 picture of a Cheshire cat in Fig. 17(a) has seven 
kingwise black components: the outline of its head, the two earholes. the two 
eyes, the nose，and the smile. The result after one application of ( 161 ) is shown 
in Fig. 17(b): Seven components remain，but there’s an isolated point in one ear. 
and the other earhole will become isolated after the next step. Hence Fig. 17(c) 
has only five components. After six steps the cat loses its nose，and even the 
smile will be gone at time 14. Sadly, the last bit of cat will vanish during step 46. 

At most M + N _ 1 transitions will wipe out any M x N picture, because 
the lowest visible northwest-to-southeast diagonal line moves relentlessly upward 
each time. Exercises 176 and 177 prove that different components will never 
merge together and interfere with each other. 

Of course this cubic-time cellular method isn’t the fastest way to count or 
identify the components of a picture. We can actually do that job “online,” 
while looking at a large image one row at a time, not bothering to keep all of 
the previously seen rows in memory if we don’t wish to look at them again. 

While we’re analyzing the components we might as well also record the 
relationships between them. Let’s assume that only finitely many black pixels 
are present. Then there’s an infinite component of white pixels called the 
background. Black components adjacent to the background constitute the main 
objects of the image. And these objects may in turn have holes, which may serve 
as a background for another level of objects，and so on. Thus the connected 
components of any finite picture form a hierarchy — an oriented tree, rooted at 
the background. Black components appear at the odd - numbered levels of this 
tree, and white components at the even-numbered levels, alternating between 
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and we 11 be ready to scan row five. A comparison of rows four and five will 
then show that ❽ and ❻ should merge into ❽， but that new components ❽ 
and ③ should also be launched. Exercise 179 contains full details about an 
instructive algorithm that properly updates the current tree as new rows are 
input. Additional information can also be computed on the fly: For example, we 
could determine the area of each component，the locations of its first and last 
pixels，the smallest enclosing rectangle, and / or its center of gravity. 


During the shrinking process of Fig. 17. components disappear in the order 

©, {❽，②，③ } (all at time 3), ❺，❽，❾，❻ ,(f), ❽. 

Suppose we want to analyze the components of such a picture by reading 
one row at a time. After weVe seen four rows the result-so-far will be 
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^Filling. Let’s complete our quick tour of raster graphics by considering how 
to fill regions that are bounded by straight lines and / or simple curves. Particu¬ 
larly efficient algorithms are available when the curves are built up from “conic 
sections” 一 circles ， ellipses ， parabolas, or hyperbolas, as in classical geometry. 

In keeping with geometric tradition，we shall adopt Cartesian coordinates 
(x.y) in the following discussion，instead of speaking about rows or columns 
of pixels: An increase of x will signify a move to the right, while an increase 
of y will move upward. More significantly, we will focus on the edges between 
square pixels，instead of on the pixels themselves. Edges run between integer 
points (x,y) and {x r ^ y r ) of the plane when \x — x r \ - \- \y — y f \ = 1. Each pixel 
is bounded by the four edges {x. y) —— (x—l,y) —— (x— 1 , y—1) —— (x,y_l ) —— 
{x. y). Experience has shown that algorithms for filling contours become simpler 
and faster when we concentrate on the edge transitions between white and black ， 
instead of on the black pixels of a custerized boundary. (See, for example，the 

discussion by B. D. Ackland and N. Weste in IEEE Trans. C-30 (1981), 41-47.) 

Consider a continuous curve z(t) = ⑷ , y(t)) that is traced out as t varies 

from 0 to 1. We assume that the curve doesn’t intersect itself for 0 < t < 1. and 

— i 

that z(0) = z(l). The famous Jordan curve theorem [C. Jordan, Corns d’analyse 
3 (1887), 587-594; O. Veblen, Trans. Amer. Math. Soc. 6 (1905), 83-98] states 

that every such curve divides the plane into two regions, called the inside and 
the outside. We can “digitize” z{t) by forcing it to travel along edges between 
pixels; then we obtain an approximation in which the inside pixels are black and 
the outside pixels are white. This digitization process essentially replaces the 
original curve by the sequence of integer points 

round (2： ⑷） = ([x(t) + |J, [y(t) + |J), for 0 < t < 1 . ( 164 ) 

The curve can be perturbed slightly, if necessary, so that z(t) never passes exactly 
through the center of a pixel. Then the digitized curve takes discrete steps along 
pixel edges as t grows; and a pixel lies inside the digitization if and only if its 
center lies inside the original continuous curve {z(t) | 0 < t < 1 }. 

For example, the equations x(t) = 20cos27ft and y(t) = 10 sindefine an 
ellipse. Its digitization, round(z(t)), starts at (20,0) when t = 0. then jumps to 
(20,1) when t ^ .008 and 10 sin27ft = 0.5. Then it proceeds to the points (20, 2), 
(19, 2), (19,3), (19,4), (18,4), (20, —1), (20,0), as t increases through the 

values .024, .036, .040 ， .057, .062 ， … ， .976. .992: 


(1 牝 ) 


The horizontal edges of such a boundary are conveniently represented by bit 

vectors H (y) for each y; for example, H(10) = ... 000000111111111111000000 ... 
and H(9 )= … 011111000000000000111110 ••• in ( 165 ). If the ellipse is filled 
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with black to obtain a bitmap B ， the H vectors mark transitions between black 
and white, so we have the symbolic relation 

H = B ㊉ （ 5 汽 1). (i66) 

Conversely. it，s easy to obtain B when the H vectors are given: 

B(y) = H(" max ) ㊉ 丑 ("max-1) ㊉…㊉ + 1) 

= ㊉ H(y m i n+ i) ㊉…㊉ H(y). (167) 

Notice that H(y m i n ) ㊉ (y m i n +i) ㊉ • • •㊉ 好 (Wmax) is the zero vector, because each 
bitmap is white at both top and bottom. Notice further that the analogous verti¬ 
cal edge vectors V (x) are redundant: They satisfy the formulas V = B ㊉ （ B 《 1) 
and B = V® (see exercise 36)，but we need not bother to keep track of them. 

Conic sections are easier to deal with than most other curves, because we 
can readily eliminate the parameter t. For example, the ellipse that led to ( 165 ) 
can be defined by the equation (x/ 20) 2 + (y/10) 2 = 1 ， instead of using sines 
and cosines. Therefore pixel (x, y) should be black if and only if its center point 
(x— I ， 7 /— |) lies inside the ellipse, if and only if (x —1) 2 /400+(^/— |) 2 /100 —1 < 0. 

In general, every conic section is the set of points for which F(x,y) = 0 , 
when F is an appropriate quadratic form. Therefore there’s a quadratic form 

Q(x ， y) = F(x — \.y — \) = ax 2 + bxy + cy 2 dx ey f (168) 

that is negative at the integer point {x^ y) if and only if pixel (x,y) lies on a 
given side of the digitized curve. 

For practical purposes we may assume that the coefficients (a, 6 ,…， /) of Q 
are not-too - large integers. Then we’re in luck，because the exact value of Q(x, y) 
is easy to compute. In fact，as pointed out by M. L. V. Pitteway [Comp. J. 
10 (1967) ， 282-289]. there’s a nice “three - register algorithm” by which we can 
quickly track the boundary points: Let x and y be integers, and suppose weVe got 
the values of Q(x, y), Q x (x, y). and Q y (x : y) in three registers (Q ， Q x： Q y )^ where 

Q x (x : y) = 2 ax by d and Q y (x, y) = bx 2 cy + e (i 6 g) 

r\ r\ 

are -q^Q and ~^Q. We can then move to any adjacent integer point，because 
Q(x±l,y) = Q(x,y)±Q x (x,y)+a, Q(x,y±l) = Q(x,y)±Q y (x,y)+c, 

— Qx (-^5 y) Q x {x^y^i\^ = Q x {x ^ y^j 

Q y (x±l,y) = Q y (x,y)±b ： Q y (x,y±l) = Q y (x,y)± 2 c. (170) 

Furthermore we can divide the contour into separate pieces, in each of which x(t) 
and y{t) are both monotonic. For example, when the ellipse ( 165 ) travels from 
( 20 , 0 ) to ( 0 , 10 )，the value of x decreases while y increases; thus we need only 
move from (x^y) to {x— 1 ^ y) or to (x, If registers (Q, S) respectively 

hold (Q ， Q x —a. Qy +c)，a move to {x— 1 ^ y) simply sets Q Q — R. R R — 2 a, 
and S <— S — b] ^ move to (x, y+1) is just as quick. With care, this idea leads to 
a blindingly fast way to discover the correctly digitized edges of any conic curve. 
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For example, the quadratic form Q(x. y) for ellipse ( 165 ) is 4x 2 + 16y 2 — 
(4x + 16y + 1595)，when we integerize its coefficients. We have Q(20.0)= 
jP( 19.5, —0.5) = —75 and Q(21,0) = +85; therefore pixel (20,0), whose center is 
(19.5, —0.5). is inside the ellipse, but pixel (21,0) isn’t. Let’s zoom in closer: 


(m) 

( 21 , 0 ) 

The boundary can be deduced without examining Q at very many points. In 
fact, we don’t need to look at Q( 21 ， 0 )，because we know that all edges between 
(20, 0) and (0.10) must go either upwards or to the left. First we test Q(20 : 1) 
and find it negative (—75); so we move up. Also Q(20,2) is negative (—43), so 
we go up again. Then we test Q{20 : 3)，and find it positive (21); so we move left. 
And so on. Only the Q values —75. —43 ， 21 ， —131. —35 ， 93 ， —51， … actually 
need to be examined, if we’ve set the three-register method up properly. 

Algorithm T (Three-register algorithm for conics). Given two integer points 
(x. y) and [x’ ， y ’） ， and an integer quadratic form Q as in ( 168 )，this algorithm 
decides how to digitize a portion of the conic section defined by F{x, y) = 0, 
where F{x^ y) = Q(x + y + |). It creates \x f — x\ horizontal edges and \y r — y 
vertical edges，which form a path from (x, y) to [x r . y f ). We assume that 

i) Real-valued points rf) and drf) exist such that r]) = rj f ) = 0. 

ii) The curve travels from rj) to ($’，//) monotonically in both coordinates. 

iii) z + ! 」，々 =Lw + I 」，尤 ’ =li’ + !」 ，and y = LV + -」. 

iv) If we traverse the curve from rf) to (f ， r/), we see F < 0 on our left. 

Tl. [Initialize.] If x = go to Til; i! y = y’，go to T10. If x < x / and y < y 、 
set Q — Q(x+ 1 ， 2 /+ 1 )， — Q x (x+ 1 , y+l)+a, S Q y (x+1^ y+l)+c，and 
go to T2. If x < x’ and y > y f ^ set Q Q^x+l^y). R Qx^x+l.y) + a. 
S Q y (x+l^y) — c. and go to T3. If x > x’ and y < y f ， set Q ^― 
Q{x, y+1), R Q x (x, y-^-1) - a, 5 Q y (x, y-\-l) + c, and go to T4. If 

X > x f and y > y f , set Q 4 - Q(x,y), R i- Q x (x ， y) - a, S Q y (x,y) - c, 
and go to T5. 

T2. [Right or up.] If Q < 0, do T9; otherwise do T 6 . Repeat until interrupted. 
T3. [Down or right.] If Q < 0, do T7; otherwise do T9. Repeat until interrupted. 
T4. [Up or left.] If Q < 0, do T 6 ; otherwise do T 8 . Repeat until interrupted. 
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T5. [Left or down.] If Q < 0. do T 8 ; otherwise do T7. Repeat until interrupted. 

T6. [Move up.] Create the edge {x, y) —— (x ， y+1), then set y y+1. Interrupt 
to T10 if y = y r ; otherwise set Q Q + 5, R t R b，S 5 + 2c. 

T7. [Move down.] Create the edge (x ， y) —— (x. y—1), then set y ^ y — 1. 
Interrupt to T10 if = y' ] otherwise set Q Q — S,R^r- R — b,S^r- S — 2c. 

T8. [Move left.] Create the edge (x ， y) —— (x—1, y) : then set x ^ x — 1. 
Interrupt to Til if x = x f ; otherwise set Q Q—R ， R^r- R—2a, S 4 — S—b. 

T9. [Move right.] Create the edge {x, y) —— (x+1^ y), then set x ^ x -\- 1. 
Interrupt to Til \ix = x r \ otherwise set Q Q~\~R, R t R-\-2a, S 4 — 5+6. 

T10. [Finish horizontally.] While x < create the edge (x ， y) —— (x+1. y) and 
set x x + 1. While x > create the edge (x ， y) —— (x—1. y) and set 
x i— x — 1. Terminate the algorithm. 

Til. [Finish vertically.] While y < y r : create the edge {x, y) —— {x, y+1) and 
set y ^ y -\- 1. While y > y 、create the edge (x^ y) —— (x, y—1) and set 
y ^ y — 1. Terminate the algorithm. | 

For example, when this algorithm is invoked with (x./y) = (20.0), {x r . y r )= 
(0.10)，and Q(x.y) = 4x 2 + 16y 2 — 4x — 16y — 1595. it will create the edges 
(20,0) — (20,1) — (20,2) — (19,2) — (19,3) — (19,4) — (18,4) — 
(18, 5) —— (17, 5) —— （ 17, 6) ——…—— （ 5, 9) —— （ 5,10)， then make a beeline 
for (0,10). (See ( 165 ) and ( 171 ).) Exercise 182 explains why it works. 

Movement to the right in step T9 is conveniently implemented by setting 
H(y) 4 —好 (y) ㊉ （ 1 《 (x max — x)), using the H vectors of ( 166 ) and ( 167 ). 
Movement to the left is similar，but we set x ^ x — 1 first. Step T10 could set 

H(y) iJ(2/)©((l 《 0r max + l-min(:r,r’)) ） -(l 《 (: r max -max(:r,〆)))); (172) 

but one move at a time might be just as good, because \x r — x\ is often small. 
Movement up or down needs no action, because vertical edges are redundant. 

Notice that the algorithm runs somewhat faster in the special case when 
6 = 0; circles always belong to this case. The even more special case of straight 
lines, when a = 6 = c = 0 , is of course faster yet; then we have a simple one- 
register algorithm (see exercise 184). 


Fig. 18 . Pixels change from 
white to black and back again, 
at the edges of digitized circles. 

When many contours are filled in the same image，using H vectors, the 
pixel values change between black and white whenever we cross an odd number 
of edges. Figure 18 illustrates a tiling of the hyperbolic plane by equilateral 
45°-45°-45° triangles，obtained by superimposing the results of several hundred 
applications of Algorithm T. 
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Algorithm T applies only to conic curves. But that’s not really a limitation 
in practice, because just about every shape we ever need to draw can be well ap¬ 
proximated by “piecewise conics” called quadratic Bezier splines or squines. For 
example, Fig. 19 shows a typical squine curve with 40 points (zo- 之 1 ， • • • ，之 39 , 之 40 )， 
where Z 40 = zq. The even-numbered points ( 2 ： o, 之 2 , • • • ，之 40 ) lie on the curve; 
the others, (^i, 2 : 3 , ••• ， 2 : 39 )，are called “control points^ because they regulate 
local bending and flexing. Each section S ^ ： 2 j+i^ z 2 j-\- 2 ) begins at point Z 2 j. 
traveling in direction Z 2 j+i - It ends at point 2 ： 2 j+ 2 ，traveling in direction 
2 ： 2 j +2 — ^ 2 j+i- Thus if Z 2 j lies on the straight line from Z 2 j-i to 2 ： 2 j+i ： the squine 
passes smoothly through point Z 2 j without changing direction. 

Exercise 185 defines S(z 2 j. ^ 2 j+i^ z 2 j+ 2 ) precisely, and exercise 186 explains 
how to digitize any squine curve using Algorithm T. The region inside the 
digitized edges can then be filled with black pixels. 

Incidentally, the task of drawing lines and curves on a bitmap turns out 
to be much more difficult than the task of filling a digitized contour, because 
we want diagonal strokes to have the same apparent thickness as vertical and 
horizontal strokes do. An excellent solution to the line - drawing problem was 

found by John D. Hobby, JACM 36 (1989), 209-229. 

*Branchless computation. Modern computers tend to slow down when a 
program contains conditional branch instructions, because an uncertain flow 
of control can interfere with predictive lookahead circuitry. Therefore we’ve 
used MMIX’s conditional-set instructions like CSNZ in programs like ( 56 ). Indeed, 
the four instructions C SRU z ， y ， 16; ADD t ,1am,16; CSNZ y ， q ， z; CSNZ lam ， q ， t’ 
found in ( 56 ) are probably faster than their three-instruction counterpart 

BZ q ， @+12; SRU y ， y ， 16; ADD lam,lam, 16 ( 173 ) 
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when the actual running time is measured on a highly pipelined machine，even 
though the rule-of-thumb cost of ( 173 ) is only Zv according to Table 1.3.1-1. 
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Bitwise operations can help diminish the need for costly branching. For 
example，if MMIX didn’t have a CSNZ instruction we could write 

NEG m ， q; OR m ， m ， q; SR m ， m ， 63; 

SRU t ， y ， 16; XOR t ， t ， y; AND t ， t ， m; XOR y ， y ， t; ( 174 ) 

ADD t ,1am, 16; XOR t ,1am; AND t ， t ， m; XOR lam,lam,t; 

here the first line creates the mask m = — [g ^ 0]. On some computers these eleven 
branchless instructions would still run faster than the three instructions in ( 173 ). 

The inner loop of a merge sort algorithm provides an instructive example. 
Suppose we want to do the following operations repeatedly: 


If Xi < yj, set z/e 4— i i + 1 ， and go to x_done if i = i max . 
Otherwise set Zk ^ yj^ j ^ j -\- and go to y_done if j = j max . 

Then set k k 1 and go to z_done if k = k max . 

If we implement them in the “obvious” way, four conditional branches are in¬ 
volved^ three of which are active on each path through the loop: 


1H CMP t ,xi ,yi ; BNN t, 2F Branch if Xi > yj. 



STO 

xi ， zbase,kk 


ADD 

ii,ii ,8 


BZ 

ii ,X JDone 


LDO 

xi ， xbase ， ii 


JMP 

3F 

2H 

STO 

yj ， zbase,kk 


ADD 

jj jj * 8 


BZ 

jj ,Y_Done 


LDO 

yj ,ybase, j j 

3H 

ADD 

kk 3 kk ,8 


PBNZ 

kk,lB 


JMP 

Z_Done 


Zk Xi. 

i i— i 1. 

To x_done if z = z max . 
Load Xi into register xi. 
Join the other branch. 

Zk Vj- 

j ^ j + 1 . 

To y.done if j = j max . 
Load yj into register yj. 

k ^ k -\- 1. 

Repeat \i k ^ A: max - 
To z_done. 1 


(Here ii = 8(i - i m ax), jj = 8 (j - j max )，and kk = 8(A: - & max ); the factor of 
8 is needed because yj ， and Zk are octabytes.) Those four branches can be 
reduced to just one: 


1H CMP 

t ， xi，yj 

CSN 

yj ， t，xi 

STO 

yj,zbase,kk 

AND 

t ,t ,8 

ADD 

• • • • a 

11,11 , t 

LDO 

xi,xbase ， ii 

XOR 

t ,t ,8 

ADD 

• • • • 1 

LDO 

yj,ybase,jj 

ADD 

kk,kk,8 

AND 

u,ii,jj; AND u,u,kk 

PBN 

u，lB 


t i- sign(xi - yj). 

yj 

之 fc —• y j • 

t 8[xi < yj]. 
i i [xi< yj]. 

Load Xi into register xi. 
t i — 亡 ㊉ 8. 

j <- j + [^i>Vj]- 

Load yj into register yj. 

k ^ k 1. 

从卜 ii & j j & kk. 

Repeat if z< z m ax, j < jmax，and k<k 


When the loop stops in this version，we can readily decide whether to continue at 
x—done, y_done^ or z_done. These instructions load both Xi and yj from memory 
each time, but the redundant value will already be present in the cache. 


mask 

signed shift right 
merge sort 
cache 
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*More applications of MOR and MXOR. Let’s finish off our study of bitwise 
manipulation by taking a look at two operations that are specifically designed for 
64-bit work. MMIX’s instructions MGR and MXOR, which essentially carry out matrix 
multiplication on 8 x 8 Boolean matrices，turn out to be extremely flexible and 
powerful，both by themselves and in combination with other bitwise operations. 

If x = {xj ... ^ 1 ^ 0)256 is an octabyte and a = .. . ^ 1 ^ 0)2 1S a single byte, 

the instruction MOR t ,x,a sets t <— ( 17 X 7 | • • • | a\X\ \ aoXo, while MXOR t ， x，a sets 
t ( 17 X 7 ㊉ • • • ㊉ aiXi ㊉ aoXo. For example, MOR t ， x，2 and MXOR t ， x，2 both set 
t i— xi ； MOR t ， x ， 3 sets t i— \ xo ： and MXGR t ， x ， 3 sets t 4— xi ㊉ ： To• 

In general, of course，MOR and MXOR are functions of octabytes. When y = 
(" 7 … yi 々 o )256 is a general octabyte，the instruction MOR t ， x，y produces the 
octabyte t whose jth byte tj is the result of MOR applied to x and yj. 

Suppose x = —1 = # ffffffffffffffff. Then MOR t ， x，y computes the 
mask t in which byte tj is # f f whenever yj 7 ^ 0. while tj is zero when yj = 0. This 
simple special case is quite useful, because it accomplishes in just one instruction 
what we previously needed seven operations to achieve in situations like ( 92 ). 

We observed in ( 66 ) that two MORs will suffice to reverse the bits of any 64-bit 
word, and many other important bit permutations also become easy when MOR 
is in a computer’s repertoire. Suppose 7 r is a permutation of {0, 1 ， … ， 7} that 
takes 0 4 Ott ， 1 4 1 兀， .... 7 7tt. Then the octabyte p = (2 ?7r . .. 2 l7r 2 07r )256 
corresponds to a permutation matrix that makes MOR do nice tricks: MOR t ， x，p 
will permute the bytes of x，setting tj Xj n . Furthermore, MOR u ， p，y will 
permute the bits of each byte of according to the inverse permutation; it sets 
Uj 4 — (a 7 • • • aitto )2 when yj = .. . ai 7 r ao 7 r ) 2 - 

With a little more skullduggery we can also expedite further permutations 
such as the perfect shuffle ( 76 ), which transforms a given octabyte z = 2 32 x-\-y = 
(X 31 … XiXqi/si . .. ^ 1 ^ 0)2 into the “zippered” octabyte 

w = x\y = (^ 31^/31 • - •x 1 y 1 x 0 y 0 ) 2 . ( 175 ) 

With appropriate permutation matrices p ， g，and r，the intermediate results 

t = (^31^27^30^26^29^25^28^24^31?/27^30 2/26^292/25?/28?/24 --- 

x 7 x 3 x 6 X2X 5 x 1 x 4 x 0 y 7 y 3 y 6 y 2 y5yiy4yo)2, ( 176 ) 

u — (y27y3iy26y30y25y29y24y28 x 27 x 31 x 26 x 30 x 25 x 29 x 24 x 28 --- 

y 3 y 7 y 2 y 6 yiy 5 yoy 4 X 3 X 7 x 2 x 6 x 1 x 5 x 0 x 4)2 ( 177 ) 

can be computed quickly via the four instructions 

M0Rt ， z ， p; M0Rt ， q ， t; M0Ru ， t ， r; MOR u,r ,u; ( 178 ) 

see exercise 203. So there’s a mask m for which C PUT rM,m; MUX w ， t ， u’ completes 

the perfect shuffle in just six cycles altogether. By contrast, the traditional 
method in exercise 53 requires 30 cycles (five d - swaps). 

The analogous instruction MXOR is especially useful when binary linear alge¬ 
bra is involved. For example, exercise 1.3.1-37 shows that XOR and MXOR directly 
implement addition and multiplication in a finite field of 2 k elements，for k < 8 . 


MOR++ 

MXOR++ 

matrix multiplication 
mask 

bit permutations 

byte permutations 

permutation matrix 

inverse permutation 

perfect shuffle 

zippered 

MUX 

(^-swaps 

finite field 
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The problem of cyclic redundancy checking provides an instructive example 
of another case where MXOR shines. Streams of data are often accompanied by 
“CRC bytes” in order to detect common types of transmission errors [see W. W. 

Peterson and D. T. Brown, Proc. IRE 49 (1961), 228-235]. One popular method, 

used for example in MP3 audio files, is to regard each byte a = .. . 0 ^ 0)2 

as if it were the polynomial 

a(x) = (07 . .. aiao) x = a^x 7 + • • • + a\x + a^. ( 179 ) 

When transmitting n bytes a n _i .. . a\aQ. we then compute the remainder 

+ o ； i(x)x 8 + ao(x))x 16 modp(x), ( 180 ) 

where p(x) = x 16 +x 15 +x 2 + 1 ， using polynomial arithmetic mod 2 , and append 
the coefficients of ^ as a 16-bit redundancy check. 

The usual way to compute [3 is to process one byte at a time, according to 
classical methods like Algorithm 4.6.ID. The basic idea is to define the partial 
result (3m = (a n ^i[x)x s ^ n ^ 1 ^ + ••• + Of m (x)x 8m ) x 16 modp(x) so that /3 n = 0. 
and then to use the recursion 

An = ((An+i < 8 ) & # ff 00 ) ㊉ crcJable^m^i 》 8 ) ㊉ a m ] ( 181 ) 

to decrease m by 1 until m = 0. Here crc^table[a] is a 16-bit table entry that 
holds the remainder of a(x)x 16 ^ modulo p(x) and mod 2， for 0 < a < 256. 
See A. Perez, IEEE Micro 3,3 (June 1983), 40-50.] 

But of course we’d prefer to process 64 bits at once instead of 8 . The solution 
is to find 8 x 8 matrices A and B such that 

a(x)x 64 = (aA)(x) + (aB)(x)x~ 8 (modulo p(x) and 2), ( 182 ) 

for arbitrary bytes considering a to be a 1 x 8 vector of bits. Then we can 
pad the given data bytes a n _i . .. aia^ with leading zeros so that n is a multiple 
of 8 ， and use the following efficient reduction method: 

Begin with c i— 0^ n i— n — 8. and t < — (a n+ 7 . .. a n ) 256 - 
While n > 0， set u<—t-A^v^-t-B^n^-n — 8^ ( 183 ) 

t (a n+7 ... a n ) 2 56 ㊉ 以 ㊉ （r 》 8 ) ㊉ （c 《 56), and c v k # ff. 

Here t - A and t - B denote matrix multiplication via MXOR. The desired CRC 
bytes ， (tx 16 + cx 8 ) modp(x). are then readily obtained from the 64-bit quantity t 
and the 8 -bit quantity c. Exercise 210 contains full details; the total running 
time for n bytes comes to only (/i + 10v)n/8 + 0 ( 1 ). 

The exercises below contain many more instances where M0R and MXOR lead 
to substantial economies. New tricks undoubtedly remain to be discovered. 

For further reading. The book Hacker’s Delight by Henry S. Warren, Jr. 
(Addison—Wesley, 2002) discusses bitwise operations in depth, emphasizing the 
great variety of options that are available on real - world computers that are not 
as ideal as MMIX. 


cyclic redundancy checking 

CRC 

Peterson 

Brown 

MP3 (MPEG-1 Audio Layer III) 

Perez 

Warren 
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EXERCISES 


► 1. [15] What is the net effect of setting x <— x ^ y, y <— y ® (x m )， x x ® yl 


2. [16] (H. S. Warren, Jr.) Are any of the following relations valid for all integers x 

and y? (i) 尤 ㊉ y < 尤 | y; (ii) x ^ y ^ x \ y\ (iii) \x — y\ < x ㊉ y. 

3* [M20] If x = (x n -i ... xiXo )2 with x n ^i = 1, let x M : 


( 无 n —i • • • x\Xo) 2 - Thus we 


have 0 M ， 1 M ， 2 M ， 3 M ，• 


-1 ， 0, 1 ， 0, 3, 2, 1 ， 0, 7, 6,…， if we let 0 M = -1. Prove 


that ( 尤 ㊉ y) M < 


x 


y 2 t ㊉ y for all > 0. 


► 4. [Ml 6] Let x c = x, x N = —x, x s = x -\-l^ and x p = x — 1 denote the complement, 

the negative, the successor, and the predecessor of an infinite-precision integer x. Then 
we have x cc = x NN = x sp = x ps = x. What are x CN and x NC ? 


5. [M21 ] Prove or disprove the following conjectured laws concerning binary shifts: 

a) (x «C j) < /c = x *C (j + A:); 

b ) (^>i) & (y 《 k) = ((x 》 (j + k)) & y) 《 k = (x & (y 《 (j + k))) 》 j• 

6. [M22] Find all integers x and y such that (a) x 》 y = y 》 x] (b) x《y = y 《 x. 

7. [M22] (R. Schroeppel, 1972.) Find a fast way to convert the binary number 
x = (... X 2 XiXo )2 to its negabinary counterpart x = (.. . x , 2 x ， iXq)^ 2 ^ and vice versa. 
Hint: Only two bitwise operations are needed! 

► 8. [M22] Given a finite set S of nonnegative integers, the “minimal excludant r of S 

is defined to be 


Warren 

subtraction 

complement 

negative 

infinite-precision 
Schroeppel 
negabinary 
radix — 2 

minimal excludant 

mex 

Nim 

game 

Conway’s field 

nim multiplication 

recursively 

field 

Lenstra 

nim division 

Nim, second-order 


mex(*S) = min{ A; | A: > 0 and k ^ S }. 

Let x ® 5 denote the set {x ^ y \ y S}. Prove that if x = mex(5) and y = mex(T) 
then x ® y = mex((5 ㊉ y) U (x ㊉ T)). 

9. [M26] (Nim.) Two people play a game with k piles of sticks，where there are aj 
sticks in pile j. If ai = • • • = = 0 when it is a player’s turn to move, that player 

loses; otherwise the player reduces one of the piles by any desired amount，throwing 
away the removed sticks，and it is the other player’s turn. Prove that the player to 
move can force a victory if and only if a\ ㊉…㊉ afc # 0. 

10. [HM40] [Conway’s field.) Continuing exercise 8 , define the operation x ③ y of 
“nim multiplication” recursively by the formula 

x (g> y = mex{(x 0 j) ㊉ (< ③ y) © (i 0 j) \ 0 < i < 0 < j < y}. 


Prove that ® and 0 define a field over the set of all nonnegative integers. Prove also 
that if 0 < x, y < 2 2Tl then x y < 2 2 ' and x 0 2 2 打 = 2 2Tl x. (In particular, this field 
contains subfields of size 2 2?1 for all n > 0.) Explain how to compute x <S) y efficiently. 

► 11. [M26] (H. W. Lenstra, 1978.) Find a simple way to characterize all pairs of 
positive integers (m ， n) for which m (g) n = mn in Conway’s field. 

12. [M26] Devise an algorithm for division in Conway’s field. Hint: li x < 2 2?1+1 then 
we have x ⑭ （x ㊉ （x 》 2 n )) < 2 2 ' 

13. [M32] {Second-order nim.) Extend the game of exercise 9 by allowing two kinds 

of moves: Either aj is reduced for some 乂 as before; or aj is reduced and ai is replaced 
by an arbitrary nonnegative integer, for some i < j. Prove that the player to move 
can now force a victory if and only if the pile sizes satisfy either a 2 # ㊉…㊉ 似 or 

ai # 03 ㊉ （2 ③ 04 ) ㊉ • ••㊉ （ (A: — 2) ( 8 ) afc). For example, when k = 4 and (ai ， a 2 , a 3 , a 4 )= 
(7, 5, 0,5)，the only winning move is to (7, 5, 6 , 3). 
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14. [M30] Suppose each node of a complete, infinite binary tree has been labeled with 
0 or 1. Such a labeling is conveniently represented as a set T = {t ， to ， ^i ， Zoo ， ^oi ， ^io ， ^ii ， 
fooo,...}，with one bit for every binary string a] the root is labeled t, the left 
subtree labels are To = {^o^oo^oi. ^ooo, • • •}： and the right subtree labels are Ti = 
{^i, ^io^ii , ^ioo,... }. Any such labeling can be used to transform a 2-adic integer 
x = (• • • X 2 XiXo )2 into the 2-adic integer y = (• • • y 2 yiyo )2 = T(x) by setting y 0 = 
yi = t XQ , y 2 = t XQ xi • etc.，so that T(x) = 2T Xo (L 尤 /2」) + t. (In other words, x defines 
an infinite path in the binary tree，and y corresponds to the labels on that path，from 
right to left in the bit strings as we proceed from top to bottom of the tree.) 

A branching function is the mapping x T = x 0 T{x) defined by such a labeling. 
For example, if toi = 1 and all of the other are 0, we have x T = x 0 4[x mod 4 = 2]. 

a) Prove that every branching function is a permutation of the 2-adic integers. 

b) For which integers A; is x ㊉ （ x 《 A:) a branching function? 

c) Let x x T be a mapping from 2-adic integers into 2-adic integers. Prove that x T 
is a branching function if and only if p(x ® y) = p(x T ® y T ) for all 2-adic x and y. 

d) Prove that compositions and inverses of branching functions are branching func¬ 
tions. (Thus the set B of all branching functions is a permutation group.) 

e) A branching function is balanced if the labels satisfy ㊉ 亡 ai for all a. Show 

that the set of all balanced branching functions is a subgroup of B. 

15. [M21] J. H. Quick noticed that ((x + 2) ® 3) — 2 = ((x — 2) ® 3) + 2 for all x. Find 
all constants a and b such that ((x + a) ® 6) — a = ((x — a) 0 6) + a is an identity. 

16* [M31 ] A function of x is called animating if it can be written in the form 

((• • • (((($ + fli) ㊉ 6i) + fl2) ㊉ &2) +•••)+ Am) ㊉ 6m 


complete, infinite binary tree 

2-adic integer 

branching function 

permutation 

ruler function rho 

group 

composition of permutations 

balanced 

Quick 

XOR identities 
animating 
pixel pattern 


for some integer constants ai ， 61 ， a 2 ， 62 , … ， flm, b m ，with m > 0 . 

a) Prove that every animating function is a branching function (see exercise 14). 

b) Furthermore, it is balanced if and only if 61 ® 62 ® • • • ® bm = 0- Hint: What 
binary tree labeling corresponds to the animating function ((x ㊉ c) — 1 ) ㊉ c? 

c) Let [x~\ = x®(x—1) = 2 P ( X ) +1 — 1. Show that every balanced animating function 
can be written in the form 


x ㊉ |_尤 ©pil ㊉㊉ P 2 I ㊉… •㊉ L 尤 ㊉ 仍 1 ， < V2 < …< pi ， 


for some integers {pi ， P 2 , … ， pz}，where l > 0 , and this representation is unique, 

d) Conversely, show that every such expression defines a balanced animating function. 


17. [HM36] The results of exercise 16 make it pos¬ 
sible to decide whether or not any two given ani¬ 
mating functions are equal. Is there an algorithm 
that decides whether any given expression is iden¬ 
tically zero, when that expression is constructed 
from a finite number of integer variables and con¬ 
stants using only the binary operations + and ㊉ ？ 
What if we also allow &? 

18. [M25] The curious pixel pattern shown here 
has (x 2 y 》 11 ) & 1 in row x and column for 
1 < X, y < 256. Is there any simple way to explain 
some of its major characteristics mathematically? 
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Paley 
sorted 

zero-one principle 
0—1 principle 
Gosper’s hack 
nested parentheses 
parenthesis trace 
Gosper’s hack 
MMIX 

prime numbers 
sieve 

Eratosthenes 
bookworm 
pack 

allocation of memory 
storage allocation 
division, avoiding 
Pratt 

magic mask 

u i— x —x\ v i— x u; y <— v (((v ㊉ x)Iu) 》 2). 

21. [22] Construct the reverse of Gosper’s hack: Show how to compute x from y. 

22. [21] Implement Gosper’s hack efficiently with MMIX code, assuming that x < 2 64 ， 
without using division. 

► 23* [27] A sequence of nested parentheses can be represented as a binary number by 
putting a 1 in the position of each right parenthesis. For example, ‘（（））（）’ corresponds 
in this way to ( 001101 ) 2 ，the number 13. Call such a number a parenthesis trace. 

a) What are the smallest and largest parenthesis traces that have exactly m Is? 

b) Suppose x is a parenthesis trace and y is the next larger parenthesis trace with 
the same number of Is. Show that y can be computed from x with a short chain 
of operations analogous to Gosper’s hack. 

c) Implement your method on MMIX. assuming that ux < 32. 

► 24. [M30] Program 1.3.2"P instructed MMIX to produce a table of the first five hundred 
prime numbers, using trial division to establish primality. Write an MMIX program that 
uses the “sieve of Eratosthenes” (exercise 4.5.4-8) to build a table of all odd primes 
that are less than 7V，packed into octabytes Qo, Qi , …， Qn/128 as i n ( 27 ). Assume that 
N < 2 32 , and that it’s a multiple of 128. What is the running time when N = 3584? 

► 25. [i5] Four volumes sit side by side on a bookshelf. Each of them contains exactly 
500 pages, printed on 250 sheets of paper 0.1 mm thick; each book also has a front and 
back cover whose thicknesses are 1 mm each. A bookworm gnaws its way from page 1 
of Volume 1 to page 500 of Volume 4. How far does it travel while doing so? 

26. [22] Suppose we want random access to a table of 12 million items of 5-bit data. 

We could pack 12 such items into one 64-bit word, thereby fitting the table into 8 
megabytes of memory. But random access then seems to require division by 12. which 
is rather slow; we might therefore prefer to let each item occupy a full byte, thus using 
12 megabytes altogether. 

Show ， however, that there’s a memory-efficient approach that avoids division. 

27. [21] In the notation of Eqs. ( 32 )—( 43 )，how would you compute (a) (al 0 a 01 6 ) 2 ? 

(b) (alO a ll 6 ) 2 ? (c) (a 00 a 01 6 ) 2 ? (d) ( 0 °°ll a 00 6 ) 2 ? ⑷ ( 0 °° 01 a 00 6 ) 2 ? (f) ( 0 °°ll a ll 6 ) 2 ? 

28. [16] What does the operation (x+1) & x produce? 

29. [20] (V. R. Pratt.) Express the magic mask /ifc of ( 47 ) in terms of fik+i- 


► 19. [M37] [Paley’s rearrangement theorem.) Given three vectors A = (ao, •…， 

B = (60, • • • ， & 2^-1), and C = (co,..., C 2 ^-i) of nonnegative numbers, let 

f(A : B,C) = ajbkCi. 

j ㊉ fc ㊉ z = o 

For example, if n = 2 we have /(A, C) = ao bo Co +ao&iCi -\-aob 2 C 2 + ao 63 C3 +ai6oCi + 
aibico + 0162^3 + ••• + Co ： in general there are 2 2n terms, one for each choice of 
j and k. Our goal is to prove that B, C) < /(A% _B* ， C*)，where _A* denotes the 
vector A sorted into nonincreasing order, ao > ai > — 

a) Prove the result when all elements of A. B, and C are 0s and Is. 

b) Show that it is therefore true in general. 

c ) Similarly, }{A,B,C,D) = J2j^k®i®m=o a j b k c i d m < 

► 20. [21 ] {Gosper’s hack.) The following seven operations produce a useful function y 
of x, when x is a positive integer. Explain what this function is and why it is useful. 
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30. [20] If x = 0. the MMIX instructions ( 46 ) will set p 64 (which is a close enough 
approximation to oo). What changes to ( 50 ) and ( 51 ) will produce the same result? 

► 31. [20] A mathematician named Dr. L. I. Presume decided to calculate the ruler 
function with a simple loop as follows: “Set p 0; then while x 1 = 0, set p <— p-}-1 
and x <— x 1.^ He reasoned that, when x is a random integer, the average number 
of right shifts is the average value of p，which is 1 ; and the standard deviation is only 

so the loop almost always terminates quickly. Criticize his decision. 

32. [20] What is the execution time for px when ( 52 ) is programmed for MMIX? 

► 33. [26] (Leiserson, Prokop, and Randall, 1998.) Show that if ‘58’ is replaced by ‘49’ 

• 1 

in ( 52 )，we can use that method to identify both bits of the number y = 2 J -\-2 quickly ， 
when 64 > j > A: > 0. (Altogether ( 6 2 4 ) = 2016 cases need to be distinguished.) 

34. [M23] Let x and y be 2-adic integers. Prove or disprove: px = py if and only if 
x © y = (x — 1 ) © (y — 1 ). 

► 35. [M26] According to Reitwiesner’s theorem, exercise 4.1—34, every integer n has a 
unique representation n = n+ — n 一 such that u{n + ) + u(n~) is minimized. Show that 
n + and n_ can be calculated quickly with bitwise operations. Hint: Prove the identity 
(x ㊉ 3x) & ((x ㊉ 3x) 》 1 ) = 0 . 

36. [20] Given x = (a ；63 ... ^ 1 ^ 0 ) 2 , suggest efficient ways to calculate the quantities 

i) 尤 ㊉ =(x ® 3 ... xfx®) 2 ^ where x® = 仰 ㊉…㊉ xi ㊉ for 0 $ A: < 64; 

ii) . .. Xi 3 ^) 2 ，where A • • • A xi A xo for 0 < A; < 64. 

37. [16] What changes to ( 55 ) and ( 56 ) will make A0 come out —1? 

38. [17] How long does the leftmost-bit - extraction procedure ( 57 ) take when imple¬ 
mented on MMIX? 

► 39. [20] Formula ( 43 ) shows how to remove the rightmost run of 1 bits from a given 
number x. How would you remove the leftmost run of 1 bits? 

► 40. [21 ] Prove ( 58 ), and find a simple way to decide if Xx < Ay，given x and y > 0. 

41. [M22] What are the generating functions of the integer sequences (a) pn ， (b) An ， 
and (c) uni 

42. [M21 ] If n = 2 e± + • • • + 2 Gr , with ei > • • • > e r > 0. express the sum 
in terms of the exponents ei ， … ， e r . 

► 43. [20] How sparse should x be, to make ( 63 ) faster than ( 62 ) on MMIX? 

► 44. [23] (E. Freed, 1983.) What’s a fast way to evaluate the weighted bit sum 

45. [20] (T. Rokicki ， 1999.) Explain how to test if x R < y R \ without reversing x and y. 

46. [22] Method ( 68 ) uses six operations to interchange two bits Xi ^ Xj of a register. 
Show that this interchange can actually be done with only three MMIX instructions. 

47. [10] Can the general (5-swap ( 69 ) also be done with a method like ( 67 )? 

48. [M21 ] How many different (5-swaps are possible in an n-bit register? (When n = 4, 
a (5-swap can transform 1234 into 1234, 1243, 1324, 1432, 2134, 2143, 3214, 3421 ， 4231.) 

► 49. [M30] Let s(n) denote the fewest (5-swaps that suffice to reverse an n-bit number. 

a) Prove that s(n) > [log 3 n] when n is odd ， s(n) > 「 log 3 3n/2] when n is even. 

b) Evaluate s{n) when n = 3 m , 2 - 3 m , (3 m + 1)/2, and (3 m — 1)/2. 

c) What are s(32) and s(64)? Hint: Show that s(5n + 2) < s(n) + 2. 

50. [M37] Continuing exercise 49， prove that s(n) = log 3 n + O(log log n). 
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51. [23] Let c be a constant. 0 < c < 2 d . Find all sequences of masks ( 沒 o, .... 〜 -i ， 

八 八 八 

Qd- 2 ) … , Oo) such that the general permutation scheme ( 71 ) takes x ^ ^ where 

the bit permutation tt is defined by either (a) jtt = j 0 c; or (b) jjr = (j + c) mod 2 d . 

八 

[The masks should satisfy Ok C and Ok C so that ( 71 ) corresponds to Fig. 12; 

see ( 48 ). Notice that reversal, x n = x R ^ is the special case c = 2 d — 1 of part (a), while 
part (b) corresponds to the cyclic right shift x n = 》 c) + (;r 《 (2 d — c)).] 

八八 A 八 八 

52. [22] Find hexadecimal constants ( 沒0 , 沒1，沒2 , 沒3 , 沒4 , 沒5 , 沒4 , 沒3 , 沒2 , 沒1，沒 0 ) that cause 
( 71 ) to produce the following important 64-bit permutations, based on the binary 
representation j = (j 5 j 4 j 3 j 2 jijo) 2： (a) jn = (jojsjdsjQjih; (h) jn = (j 2 J 4 j 3 ) 2 ; 
(c) jTT = (jijoj 5 j 4 j 3 j 2 ) 2 ; (d) jTT = (jojiJ 2 jsJ 4 js) 2 . [Case (a) is called a “perfect shuf¬ 
fle^ because it takes ( 尤 63… 尤 33 尤 32 尤 3i … xiXo )2 into ( 尤 63 尤 31 … ^ 33 ^ 1 ^ 32 ^ 0 ) 2； case (b) 
transposes an 8 X 8 matrix of bits; case (c) ， similarly, transposes a 16 X 4 matrix; and 
case (d) arises in connection with “fast Fourier transforms,” see exercise 4.6.4—14.] 

53. [M25] The permutations in exercise 52 are said to be “induced by a permutation 

of index digits,” because we obtain jtt by permuting the binary digits of j. Suppose 
jTT = (j(d-i)v ； • • • ji^jo 咕 ) 2 , where ?/; is a permutation of {0, 1,... . d — 1}. Prove that if 
*0 has t cycles，the 2 d -bit permutation x can be obtained with only d — t swaps. 

In particular, show that this observation speeds up all four cases of exercise 52. 

54. [22] (R. W. Gosper. 1985.) If an m x m bit matrix is stored in the rightmost 
m 2 bits of a register, show that it can be transposed by doing {2 k (m — l))-swaps for 
0 < A: < rig m]. Write out the method in detail when m = 7. 

55. [26] Suppose an n X n bit matrix is stored in the rightmost n 2 bits of an n 3 -bit reg¬ 
ister. Prove that 18d-\-2 bitwise operations suffice to multiply two such matrices, when 
n = 2 d ] the matrix multiplication can be either Boolean (like MOR) or mod 2 (like MXOR). 

56* [24] Suggest a way to transpose a 7 X 9 bit matrix in a 64-bit register. 

57. [22] Prove that any permutation of 2 d elements can be realized with the network 
P(2 d ) of Fig. 12 by some setting in which at most d/{2d— 1) of the crossbars are active. 

58. [M27] The first d columns of crossbar modules in the permutation network P{2 d ) 

perform a 1 -swap, then a 2 -swap, …， and finally a 2 d ~ 1 -swap, when the wires of the 
network are stretched into horizontal lines as shown here for d = 3. q m m m 

Let TV = 2 d . These N lines，together with the Nd/2 crossbars, 1 — • - -# - ^ - 

form a so-called “Omega router.” The purpose of this exercise is ^ J ' 

to study the set Q of all permutations Lp such that we can obtain 4 m m j _ 

(0(^, I 99 , …， （TV — 1 ) 99 ) as outputs on the right of an Omega router 5 ― • - -y - ^ 

when the inputs at the left are ( 0 , 1 ，…， TV - 1 ). ^ 丨 ^ 

a) Prove that |^| = 2 Nd ! 2 . (Thus lg |^| = Nd/2〜\ lgiV!.) 

b) Prove that a permutation Lp of {0, 1.... .N — 1} belongs to Q if and only if 


i mo 


d2 k = j mod 2 k and 》 k = j^p 》 k implies i^p 


J^P 


(*) 


0 (/? 

1( P 

2(p 

4<p 

6^p 

TV 


reversal 

cyclic right shift 
perfect shuffle 
outshuffle 
transposes 

fast Fourier transforms 

permutation of index digits 

Gosper 

transposed 

matrix multiplication 

Boolean matrix multiplication 

MOR 

MXOR 

swap 

Omega network for routing 
butterfly network 
shuffle network for routing 
branching functions 
animating functions 


for all 0 < i.j < N and all 0 < k < d. 

c) Simplify condition (*) to the following, for all 0 < j < N: 


A(_ ㊉ j(f) < p(i ㊉ j) implies i 


d) Let T be the set of all permutations r of {0, 1 ， … ， 7V — 1} such that p(i ® j )= 
p(ir® jr) for all i and j. (This is the set of branching functions considered in exer¬ 
cise 14, modulo 2 d ; so it has 2 A _1 members, 2 A / 2+d_1 of which are the animating 
functions modulo 2 d .) Prove that 99 G ^ if and only if r^p G ^ for all r G T. 
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e) Suppose and ^ are permutations of Q that operate on different elements; that 
is ， jip # j implies ㈣ =j ， for 0 < jf < N• Prove that G 

59. [M30] Given 0<a<6<A^ = 2 d , how many Omega-routable permutations 
operate only on the interval [a .. b]? (Thus we want to count the number of (/? G ^ such 
that j(f ^6 j implies a < j < 6 . Exercise 58(a) is the special case a = 0, b = N — 1.) 

60. [HM28] Given a random permutation of {0, 1 ， … ， 2n—1}，let p n k be the proba¬ 
bility that there are 2 k ways to set the crossbars in the first and last columns of the 
permutation network P(2n) when realizing this permutation. In other words, p n k is the 
probability that the associated graph has k cycles (see ( 75 )). What is the generating 
function X]fc>o PnkZ k ? What are the mean and variance of 2 fc ? 

61. [46] Is it NP-hard to decide whether a given permutation is realizable with at 
least one mask Oj = 0, using the recursive method of Fig. 12 as implemented in ( 71 )? 

► 62. [22] Let N = 2 d . We can obviously represent a permutation 7 rof{ 0 ， l ， ... ， TV—1} 
by storing a table of N numbers, d bits each. With this representation we have instant 
access to y = X7r, given x; but it takes Q(N) steps to find x = yir 一 when y is given. 

Show that, with the same amount of memory, we can represent an arbitrary 
permutation in such a way that xtt and yn 一 are both computable in O(d) steps. 

63• [16] For what integers w, x ， y, and 2 : does the zipper function satisfy (i) x\y = 
yjx? (ii) (x\y)'^>z = (x> \z/2\)\{y^> [z/2\)7 (iii) (wXx)k,{y\z) = (w k, y) \ {x k, z)l 

64* [22] Find a “simple” expression for the zipper-of-sums (x + x 1 ) J (y + as a 
function oi z = x \ y and z = x \y . 

65. [Ml6] The binary polynomial u{x) = + u\X + • • • + (mod 2) can be 

represented by the integer u = (u n ^\ ... uiUo) 2 . If u(x) and v{x) correspond to integers 
u and v in this way，what polynomial corresponds to u\vl 

► 66 . [M26] Suppose the polynomial u{x) has been represented as an n-bit integer u as 

in exercise 65, and let ?;= 以 ® 《 d) ® 《 26) ㊉ （ u 《 3S) ㊉ • • • for some integer S. 

a) What’s a simple way to describe the polynomial v{x)l 

b) Suppose n is large, and the bits of u have been packed into 64-bit words. How 
would you compute v when 6 = 1， using bitwise operations in 64-bit registers? 

c) Consider the same question as (b)，but when S = 64. 

d) Consider the same question as (b)，but when S = 3. 

e) Consider the same question as (b)，but when S = 67. 

67. [M31 ] If u{x) is a polynomial of degree < n, represented as in exercise 65, discuss 
the computation of v{x) = u{x) 2 mod (x n + x 771 + 1 )，when 0 < m < n and both m 
and n are odd. Hint: This problem has an interesting connection with perfect shuffling. 

68 . [20] What three MMIX instructions implement the (5-shift operation, ( 79 )? 

69. [25] Prove that method ( 80 ) always extracts the proper bits when the masks Ok 
have been set up properly: We never clobber any of the crucial bits yj. 

► 70. [31] (Guy L. Steele Jr.. 1994.) What’s a good way to compute the masks 0 。，沒 1 ， 
… ， 0d_i that are needed in the general compression procedure ( 80 )，given 

71. [17] Explain how to reverse the procedure of ( 80 )，going from the compact value 

y = (y r —1 • • • yiyo )2 to a number 2 ： = ( 2^3 •• • 之 1 : 0)2 that has Zj i = yi for 0 < i < r. 

72. [10] Simplify the expression (xJy)*| # /io, when x^y<2 2 • (See Eqs. ( 76 ) and ( 81 ).) 

73* [22] Prove that d sheep-and-goats steps will implement any 2 d -bit permutation. 


permutation network 
generating function 
variance 
NP-hard 

represent an arbitrary permutation 

zipper function 

polynomial 

polynomial remainder mod 2 
trinomial 

squaring a polynomial 
perfect shuffling 

MMIX 
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compression 

unpacking 

uncompressing 
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Chung 
Wong 
replicates 
mapping modules 
crossbar module 
mapping network 
sorting network 

distribution network, see mapping network 

permutation network 

Floyd 

Pratt 

sorting network 
disjointness 
represent sets 
Quick 

maximal proper subsets 
scattered difference 
scattered accumulator 
scattered shifting 
stretched 

segmented broadcasting, see stretching 


To within O(n), what is the smallest number G(n) of modules that are sufficient 
to implement a general n-element mapping network? 

77. [26] (R. W. Floyd and V. R. Pratt.) Design an algorithm that tests whether 
or not a given standard n-network is a sorting network, as defined in the exercises 
of Section 5.3.4. When the given network has r comparator modules, your algorithm 
should use O(r) bitwise operations on words of length 2 n . 

78. [M27] ( Testing disjointness.) Suppose the binary numbers x\ , 尤 2 ， ..., x m each 
represent sets in a universe of n — k elements, so that each Xj is less than 2 n _ k • J. H. 
Quick (a student) decided to test whether the sets are disjoint by testing the condition 

尤 1 I 尤 2 I … I Xm = {X\ + X2 + ••• + Xm) mod 2' 

Prove or disprove: Quick’s test is valid if and only if A: > lg(m — 1). 

► 79. [20] If x ^ 0 and x C x ： what is an easy way to determine the largest integer 
x, < x such that x, C %? (Thus {xi) 1 = (x), = in connection with ( 84 ).) 

80. [20] Suggest a fast way to find all maximal proper subsets of a set. More precisely, 
given % with = m，we want to find all x C x such that ux = m — 1. 

81. [21] Find a formula for “scattered difference，” to go with the “scattered sum” ( 86 ). 

82. [21] Is it easy to shift a scattered accumulator to the left by 1， for example to 

change (^2^4^32/1^22/0^1^0)2 to (yiX4X3yox 2 0x 1 xo)2^ 

► 83. [28] Continuing exercise 82， find a way to shift a scattered 2 d -bit accumulator to 
the right by 1 ， given 2 : and %, in 0{d) steps. 

84. [25] Given n-bit numbers 2 ： = (2： n _i .. . Z\Zq )2 and % = (xn-i - - - XiXo)2 ， explain 
how to calculate the “stretched” quantities 2: 「％ = (2：( n _ 1 ) T _ x ... z\^ x zq ^- x )2 and 
2 ： -zX= where 

j 「 X = max{A; \ k <j and Xfc = 1 }， j ^X = min{k \k>j and Xfc = 1 }； 


74. [22] Given counts (Co, ci ， • • • ， c 2 d_ 1 ) for the Chung—Wong procedure, explain why 
an appropriate cyclic 1 一 shift can always produce new counts (c’ 0 , .... c ， 2d _ 1 ) for which 

C 2 / = c 2 i+i ： thus allowing the recursion to proceed. 

► 75. [32] The method of Chung and Wong replicates bit / of a register exactly ci 
times, but it produces results in scrambled order. For example, the case (co, …， C7) = 
(1 ， 2, 0,2, 0, 2, 0, 1) illustrated in the text produces ( 尤 7 尤 0 尤 1 尤 5 尤 5 尤 3 尤 1 尤 3 ) 2 . In some 
applications this can be a disadvantage; we might prefer to have the bits retain their 
original order, namely (x 7 X 5 XsX 3 X 3 XiXiXo )2 in that example. 

Prove that the permutation network P(2 d ) of Fig. 12 can be modified to achieve 
this goal, given any sequence of counts (co, ci,..., c 2 d_ 1 )^ if we replace the d - 2 d — 1 
crossbar modules in the right-hand half by general 2x2 mapping modules. (A crossbar 
module with inputs (a ， 6) produces either (a, b) or (6, a) as output; a mapping module 
can also produce (a, a) or (b. b).) 

76. [4 7] A mapping network is analogous to a sorting network or a permutation 
network, but it uses 2x2 mapping modules instead of comparators or crossbars, and it 
is supposed to be able to output all n n possible mappings of its n inputs. Exercise 75, 
in conjunction with Fig. 12, shows that a mapping network for n = 2 d exists with only 
4d—2 levels of delay, and with n/2 modules on each level; furthermore, this construction 
needs general 2x2 mapping modules (instead of simple crossbars) in only d of those 
levels. 
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we let Zj^ x = 0 if Xfc = 0 for 0 < k < j. and Zj-^ x = 0 if = 0 for n > k > j. For 

example, if n = 11 and % = ( 01101110010 ) 2 ，then 2 ： % = {zqZqZsZqZqz^ ) Z4 ： z\ZiZiO)2 

and 2 ： -z x = ( 02 ： g 2 ： 8 2 ： 8 z 6 z 5 z 4 z 4 z 4 z 1 z 1 ) 2 . 

85. [22] (K. D. Tocher, 1954.) Imagine that you have a vintage 1950s computer 
with a drum memory for storing data, and that you need to do some computations 
with a 32 X 32 X 32 array a[z, j, k], whose subscripts are 5-bit integers in the range 
0 < i : j, k < 32. Unfortunately your machine has only a very small high-speed memory: 
You can access only 128 consecutive elements of the array in fast memory at any time. 
Since your application usually moves from a[z, j, k] to a neighboring position a[i’ ， j’ ， A ;’]， 
where \i — i\ -\- \j — j’| + |A: — A;’| = 1， you have decided to allocate the array so that, if 
i = ( 24 ^ 3 ^ 2 ^ 1 ^ 0 ) 2 , j = (j4j3j2jijo)2, and k = ( 念 4 念 3 念 2 念 1 念 0 ) 2 , the array entry a[z, j, A:] is 
stored in drum location (A ： 4 j 4 Z 4 A ： 3 j 3 Z 3 A ： 2 j 2 ^ 2 ^iJi^i^oio^o) 2 - By interleaving the bits in 
this way，a small change to j, or k will cause only a small change in the address. 

Discuss implementation of this addressing function: (a) How does it change when 
z, j. or k changes by 士 1? (b) How would you handle a random access to a[z 5 j, k]. given 
i ， j, and k? (c) How would you detect a “page fault” （ namely，the condition that a 
new segment of 128 elements must be swapped into fast memory from the drum)? 

86 . [M25] An array of 2 P X 2 q X 2 r elements is to be allocated by putting a[z, jf, k] 
into a location whose bits are the p q -\- r bits of (z, j. k). permuted in some fashion. 
Furthermore, this array is to be stored in an external memory using pages of size 2 s • 
(Exercise 85 considers the case p = q = r = 5 and s = 7.) What allocation strategy 
of this kind minimizes the number of times that a[i^j : k] is on a different page from 
a[i , j 1 , k 1 ], summed over all z, j. i , j 1 , and k f such that \i-i\- {- \ j — j f \ + \k — k f \ = 1 ? 

► 87. [20] Suppose each byte of a 64 - bit word x contains an ASCII code that represents 
either a letter, a digit，or a space. What three bitwise operations will convert all the 
lowercase letters to uppercase? 

88 . [20] Given x = (X 7 ... ^ 0)256 and y = ( 价 • • • yo) 256 ， compute 2 ： = ( 2:7 ••• 2 : 0 ) 256 , 
where Zj = (xj — yj) mod 256 for 0 < j < 8 . (See the addition operation in ( 87 ).) 

89. [23] Given x = (x 31 • • • ^ 1 X 0)4 and y = (?/ 3 i •… yiyo) 4 , compute 2 := ( 之 31 • • • ZiZ 0 ) 4 ^ 
where Zj = 卜 j/"j 」 f° r 0 < j < 32， assuming that no yj is zero. 

90. [20] The bytewise averaging rule (88) always rounds downward when Xj + yj is 
odd. Make it less biased by rounding to the nearest odd integer in such cases. 

► 91. [26] {Alpha channels.) Recipe (88) is a good way to compute bytewise averages, 
but applications to computer graphics often require a more general blending of 8-bit 
values. Given three octabytes x = (X 7 ... ^ 0 ) 256 , y = (jj 7 … 2 / 0 ) 256 ，a = (a? … ao) 256 , 
show that bitwise operations allow us to compute 2 : = ( 2:7 ... 2 : 0 ) 256 , where each byte Zj 
is a good approximation to ((255 — cij)xj -\-ajyj)without doing any multiplication. 
Implement your method with MMIX instructions. 

► 92. [21] What happens if the second line of (88) is changed to c z ^ (x \ y) — z^? 

93. [18] What basic formula for subtraction is analogous to formula ( 89 ) for addition? 

94. [21] Let x = (X 7 • • • xiXo )256 and t = (^7 ... ^ 1 ^ 0)256 in ( 90 ). Can tj be nonzero 
when Xj is nonzero? Can tj be zero when Xj is zero? 

95. [22] Whaf s a bitwise way to tell if all bytes oi x = (X 7 ... ^ 1 ^ 0)256 are distinct? 

96. [21] Explain ( 93 )，and find a similar formula that sets test flags tj 128[xj < yj]. 

97. [23] Leslie Lamport’s paper in 1975 presented the following “problem taken from 
an actual compiler optimization algorithm” ： Given octabytes x = {xj ... xo )256 and y = 
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(y 7 … yo) 256 , compute t = [t7 … 亡 o )256 and 2 : = ( 2:7 ... 之 0 )256 so that tj # 0 if and only 
if Xj ^ 0, Xj ^ ， * ，， and Xj ^ yj\ and Zj = (xj = 0? yy. (xj ^ ) Axj ^ yp. ， * ，： Xj)). 

98. [20] Given x = (X 7 ... ^ 0)256 and y = ( 2/7 • • • yo) 256 ，compute 2 ： = (zy ... 2 : 0)256 
and w = (W 7 ... 1 ^ 0 ) 256 , where Zj = max(x^, yj) and Wj = minfa^，")）for 0 < j < 8. 

► 99. [28] Find hexadecimal constants a, b, c, d, e such that the six bitwise operations 

y l z ㊉ a ， t ((((y & 6) + c) I y) ㊉ d) & e 

will compute the flags t = (/r • •. /i/o) 256《7 from any bytes x = (x^ ... 尤 1 X 0 ) 256 , where 

/ 0 = [X 0 = ’ ！ /]_ = [n # ’*’] ， f 2 = [x 2 < 5 A J ], / 3 = [X 3 > } Z’]，/ 4 = [x 4 > J a 5 ], 

/ 5 = [x 5 G{’0’ ， ’1’ ， ... ， ’9’}] ， f 6 = [x 6 <168], / 7 = [x 7 G {’<’，’ = 

100. [25] Suppose x = (X 15 • •. X\Xq)iq and y = ( 2/15 • •. yiyo)i 6 are binary-coded dec¬ 
imal numbers, where 0 < Xj , yj < 10 for each j. Explain how to compute their sum 
u = (iii 5 ... and difference v = (^15 ... ^ 1 ^ 0 ) 16 , where 0 < Uj.Vj < 10 and 

(Iii5 • • .^ 1 ^ 0)10 = ((尤 15 • • -^i^o)io + ( 2/15 • • .2/12/0)10) mod 10 16 , 

(? ； i 5 • • • ^ 1 ^ 0)10 = (( 尤 15 • • • 尤 1 尤 0)10 _ ( 2/15 •• • yiyo)io) mod 10 16 ， 

without bothering to do any radix conversion. 

► 101. [ 22 ] Two octabytes x and y contain amounts of time，represented in five fields 
that respectively signify days (3 bytes), hours (1 byte), minutes (1 byte), seconds 
(1 byte), and milliseconds (2 bytes). Can you add and subtract them quickly，without 
converting from this mixed-radix representation to binary and back again? 

102. [25] Discuss routines for the addition and subtraction of polynomials modulo 5, 
when (a) 16 4-bit coefficients or (b) 21 3-bit coefficients are packed into a 64-bit word. 

► 103. [21 ] Sometimes it’s convenient to represent small numbers in unary notation, so 
that 0 ， 1 ， 2 ， 3 ， .... k appear respectively as (0)2 ，（ 1)2 ，（ 11)2 ，（ 111)2 ， … ， 2 fc — 1 inside 
the computer. Then max and min are easily implemented as | and &. 

Suppose the bytes oi x = (xj ... ^ 0)256 are such unary numbers, while the bytes 
of y = (yj ... yo )256 are all either 0 or 1. Explain how to “add” y to x or “subtract” y 
from x, giving u = (uj ... 1 ^ 0)256 and v = (vj ... ^ 0)256 where 

Uj — 2 min ( 8 ? 1 s(^j+ 1 )+i/j) _ i an d Vj = 2 max (°^s( x j+ 1 )-^) _ i 


104. [22] Use bitwise operations to check the validity of a date represented in “year- 
month - day” fields (y, m, d) as in ( 22 ). You should compute a value t that is zero if and 
only if 1900 < y < 2100， 1 < m < 12, and 1 < (i < max—day(m), where month m has 
at most max 一 day[m) days. Can it be done in fewer than 20 operations? 

105. [30] Given x = (X 7 ... ^ 0)256 and y = ( 以 7 ••• 2 / 0 ) 256 , discuss bitwise operations 
that will sort the bytes into order, so that 尤 0 g yo < • • • $ 尤 7 g 2/7 afterwards. 

106. [27] Explain the Fredman-Willard procedure ( 95 ). Also show that a simple 
modification of their method will compute 2 Xx without doing any left shifts. 

► 107. [22] Implement Algorithm B on MMIX when d = 4， and compare it with ( 56 ). 

108. [26] Adapt Algorithm B to cases where n does not have the form d - 2 d . 

109. [20] Evaluate px for n-bit numbers x in (9 (log log n) broad word steps. 
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► 110. [30] Suppose n = 2 2 and 0 < x < n. Show how to compute 1 《 x in 0(e) 
broad word steps，using only shift commands that shift by a constant amount. (Together 
with Algorithm B we can therefore extract the most significant bit of an n-bit number 
in (9(log log n) such steps.) 

111. [23] Explain the 01 r pattern recognizer, ( 98 ). 

112. [46] Can all occurrences of the pattern l r 0 be identified in 0(1) broad word steps? 

113. [23] A strong broadword chain is a broadword chain of a specified width n that 

is also a 2-adic chain, for all n - bit choices of xq. For example, the 2-bit broadword 
chain [xo,x\) with xi = xo + 1 is not strong because xo = ( 11)2 makes x\ = ( 00 ) 2 . 
But (xo.x\.... ,X 4 ) is a strong broadword chain that computes (xo + 1) mod 4 for all 
0 < Xo < 4 if we set x\ = ㊉ 1 ， X 2 = xo & 1, X 3 = ^2^1, and X 4 = Xi ^ X 3 . 

Given a broadword chain {xq.x \,..., x r ) of width n, construct a strong broadword 
chain (xq, .. ., x rf ) of the same width, such that r = O(r) and (xq. , x r ) is a 

subsequence of (x' 0 ， xi ， … ， x r f). 

114. [16] Suppose (xo.xi.... ， x r ) is a strong broadword chain of width n that com¬ 

putes the value f(x) = x r whenever an n-bit number x = xo is given. Construct a 
broadword chain (Xo. , X r ) of width ran that computes X r = (/(fi) …/(^)) 2 ^ 

for any given mn - bit value X。 = (^1 ... ^) 2 ^, where 0 < fi ， • • • , < 2 n • 

► 115. [24] Given a 2-adic integer x = (... X 2 X\Xo) 2 , we might want to compute y = 
(… y 2 "i"o )2 = f(x) from x by zeroing out all blocks of consecutive Is that (a) are 
not immediately followed by two Os; or (b) are followed by an odd number of Os 
before the next block of Is begins; or (c) contain an odd number of Is. For exam¬ 
ple, if x is (... 01110111001101000110)2 then y is (a) (••• 00000111000001000110 ) 2 ; 

(b) (...00000111000000000110)2 ； (c) (... 00000000001100000110)2. (Infinitely many 

0s are assumed to appear at the right of xo. Thus, in case (a) we have 

Vj = x j A ((xj-iAxj- 2 ) V (xj-iAxj^2/\Xj-3) V Axj - 2 Axj - 3 Axj _ 4 ) V • • •) 

for all where Xk = 0 for k < 0.) Find 2-adic chains for y in each case. 

116. [HM30] Suppose x = (... X 2 XiXo )2 and y = (…" 2 々 1 " 0)2 = / ㈤， where y is 
computable by a 2-adic chain having no shift operations. Let L be the set of all binary 
strings such that yj = [xj ... X 1 X 0 G L], and assume that all constants used in the chain 
are rational 2-adic numbers. Prove that L is a regular language. What languages L 
correspond to the functions in exercise 115(a) and 115(b)? 

117. [HM^. 6 ] Continuing exercise 116, is there any simple way to characterize the reg¬ 
ular languages L that arise in shift-free 2-adic chains? (The language L = 0*(10*10*)* 
does not seem to correspond to any such chain.) 

118. [30] According to Lemma A. we cannot compute the function x 》 1 for all n- 
bit numbers x by using only additions, subtractions，and bitwise Boolean operations 
(no shifts or branches). Show, however, that 0(n) such operations are necessary and 
sufficient if we include also the “monus” operator y — z in our repertoire. 

119. [20] Evaluate the function f py (x) in ( 102 ) with four broadword steps. 

► 120. [M25] There are 2 n2 functions that take n-bit numbers (x\ ,... ， x m ) into an 

n-bit number ..., x m ). How many of them can be implemented with addition, 

subtraction, multiplication, and non-shift bitwise Boolean operations (modulo 2 n )? 

► 121. [M25] By exercise 3.1-6, a function from [0 .. 2 n ) into itself is eventually periodic. 
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a) Prove that if / is any n-bit broadword function that can be implemented without 
shift instructions，the lengths of its periods are always powers of 2 . 

b) However, for every p between 1 and n, there’s an n-bit broadword chain of length 3 
that has a period of length p. 

122. [M22] Complete the proof of Lemma B. 

123. [M23] Let a q be the constant 1 + 2 g + 2 2g + ••• + 2 ((? " 1)<3 = (2^ - 1)/(2^ - 1). 
Using ( 104 )，show that there are infinitely many q such that the operation of multiplying 
by a q , modulo 2 g ， requires Q(log q) steps in any n-bit broadword chain with n > q 2 . 

124. [M38] Complete the proof of Theorem R/ by defining an n-bit broadword chain 
( 尤 0 ，尤 1 ， ... .Xf) and sets (Uo : Ui, …， U such that，for 0 < t < /, all inputs x G Ut lead 
to an essentially similar state Q{x, t)，in the following sense: (i) The current instruction 
in Q(x ， t) does not depend on x. (ii) If register rj has a known value in Q(x. t), it holds 
Xjf for some definite index j 1 < t. (iii) If memory location M[z] has been changed，it 
holds x z n for some definite index z" < t. (The values of and z" depend on j, 2 :， 
and but not on x.) Furthermore \Ut\ > n/ 2 2 一 1 ， and the program cannot guarantee 
that r\ = px when t < f. Hint: Lemma B implies that a limited number of shift 
amounts and memory addresses need to be considered when t is small. 

125. [MSS] Prove Theorem P’. Hint: Lemma B remains true if we replace ‘=0’ by 
‘= a s ^ in ( 103 ), for any values a s . 

126. [M 46 ] Does the operation of extracting the most significant bit, 2 Xx , require 
Q(log logn) steps in an n-bit basic RAM? (See exercise 110.) 

127* [20] Prove that if there’s a way to carry out sideways addition of n-bit numbers 
in O (log log n) broadword steps, then every symmetric function of a number’s n bits 
can also be done in 0 (loglogn) broadword steps. 

128. [MJ^6] Does sideways addition require Q(logn) broadword steps? 

129. [M 46 ] Can the parity function (i^x) mod 2 be computed in 0(1) broadword 
steps? 

130. Is there an n-bit constant a such that the function (a 《 x) mod 2 n requires 
Q(logn) n-bit broadword steps? 

► 131. [23] Write an MMIX program for Algorithm R when the graph is represented by 
arc lists. Vertex nodes have at least two fields, called LINK and ARCS, and arc nodes have 
TIP and NEXT fields, as explained in Section 7. Initially all LINK fields are zero, except 
in the given set of vertices Q, which is represented as a circular list. Your program 
should change that circular list so that it represents the set R of all reachable vertices. 

► 132. [M27] A clique in a graph is a set of mutually adjacent vertices; a clique is 
maximal if it’s not contained in any other. The purpose of this exercise is to discuss 
an algorithm due to J. K. M. Moody and J. Hollis，which provides a convenient way 
to find every maximal clique of a not-too-large graph，using bitwise operations. 

Suppose G is a graph with n vertices V = {0, 1， • • • ,n — 1}. Let p v = ^{2^ 

u - v or u = v} be row v of G’s reflexive adjacency matrix, and let S v = y^{ 2 ^ | 

u + v} = 2 n — 1 — 2 V . Every subset U C V is representable as an n-bit integer 
(j(U) = for example, S v = cr(V \ v). We also define the bitwise intersection 


r(U )= 


0<^J U e U? Pu： Su) 
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a) Prove that U is a clique if and only if t(U) = cr([/). 

b) Show that if t(U) = cr(T) then T is a clique. 

c) For 1 < k < n, consider the 2 k bitwise intersections 


Ck 


0<^J U e U? Pu： Su) 


U C {0 ， 1, 


.,A: — 1} I 


and let be the maximal elements of Ck- Prove that U is a maximal clique if 
and only if cr(U) G C 才 . 

d) Explain how to compute from starting with = 2 n — 1. 

► 133. [20] Given a graph G. how can the algorithm of exercise 132 be used to find 
(a) all maximal independent sets of vertices? (b) all minimal vertex covers (sets that 
hit every edge)? 

134. [15] Nine classes of mappings for ternary values appear in ( 119 ), ( 123 ), and ( 124 ). 
To which class does the represent at io n ( 128 ) belong, if a = 0, 6 = *，c = 1? 

135. [22] Lukasiewicz included a few operations besides ( 127 ) in his three-valued logic: 
-ix (negation) interchanges 0 with 1 but leaves * unchanged; ox (possibility) is defined 
as -ix =4^ X] nx (necessity) is defined as -iO^x; and x y (equivalence) is defined as 
[x^y) /\ {y^x). Explain how to perform these operations using representation ( 128 ). 

136. [29] Suggest two-bit encodings for binary operations on the set {a ， 6 , c} that are 
defined by the following “multiplication tables” ： 


(a) 






a 

c 

c 



137. [21 ] Show that the operation in exercise 136(c) is simpler with packed vectors 
like ( 131 ) than with the unpacked form ( 130 ). 

138. [24] Find an example of three-state-to-two-bit encoding where class V a is best. 

139. [25] If x and y are signed bits 0 ， +1， or —1，what 2-bit encoding is good for 
calculating their sum ( 2 : 12 : 2)3 = x where z\ and 2:2 are also required to be signed 
bits? (This is a “half adder” for balanced ternary numbers.) 

140. [27] Design an economical full adder for balanced ternary numbers: Show how 
to compute signed bits u and v such that 3u-\-v = x -\-y z when z G {0, +1 ， —1}. 

► 141. [30] The Ulam numbers {Ui^ t/ 2 ,. • •) = (1,2,3,4. 6,8,11,13,16,18,26,...) are 
defined for n > 3 by letting U n be the smallest integer > U n 一 1 that has a unique 
representation U n = Uj + Uk for 0 < j < A: < n. Show that a million Ulam numbers 
can be computed rapidly with the help of bitwise techniques. 

► 142. [33] A subcube such as *10*1*01 can be represented by asterisk codes 10010100 
and bit codes 01001001， as in ( 85 ); but many other encodings are also possible. What 
representation scheme for subcubes works best, for finding prime implicants by the 
consensus-based algorithm of exercise 7.1.1-31? 

143. [20] Let x be a 64-bit number that represents an 8 X 8 chessboard, with a 1 bit 
in every position where a knight is present. Find a formula for the 64-bit number f(x) 
that has a 1 in every position reachable in one move by a knight of x. For example, 
the white knights at the start of a game correspond to x = #42; then f(x) = # a51800. 

144. [16] What node is the sibling of node j in a sideways heap? (See ( 134 ).) 

145. [17] Interpret ( 137 ) when h is less than the height of j. 
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► 146. [M20] Prove Eq. ( 138 ), which relates the p and A functions. 

► 147. [M20] What values of ixv. /3v, av^ and 丁 j occur in Algorithm V when the forest is 

a) the empty digraph with vertices {vi : ... : v n } and no arcs? 

b) the oriented path v n i ? 2 —> 

148. [M21 ] When preprocessing for Algorithm V. is it possible to have f3x^ 

/3y2 /3 x2 f3yi —)-* pxi in S when X 3 —^2 —^1 ^ A and 2/2 —yi ^ A 

in the forest? (If so，two different trees are “entangled” in S.) 

► 149. [23] Design a preprocessing procedure for Algorithm V. 

► 150. [25] Given an array of elements , _A n ，the range minimum query problem 

is to determine k(i.j) such that = min(A 《， …， Aj) for any given indices i and j 

with 1 < z < jf < n. Prove that Algorithm V will solve this problem, after 0(n) steps of 
preprocessing on the array A have prepared the necessary tables ( 7 r, /3. r). Hint: Con¬ 

sider the binary search tree constructed from the sequence of keys (p(l) ， p( 2 ), … ， p(n ))， 
where p is a permutation of { 1 ， 2 , • • • ， n} such that A p ^ < A p ^ 2 ) < ••- < -A p ( n ). 

151. [22] Conversely, show that any algorithm for range minimum queries can be used 
to find nearest common ancestors, with essentially the same efficiency. 

152. [M21] Prove that Algorithm V is correct. 

► 153. [M20] The pointers in a navigation pile like ( 144 ) can be packed into a binary 
string such as 


0 

10 

0 

10 0 

0 

0 0 

1 

0 10 0 

0 

0 0 

0 

0 0 0 

2 

4 

6 

8 

10 

12 

14 

16 

18 

20 

22 

24 


At what bit position (from the left) does the pointer for node j end? 

154. [20] The gray lines in Fig. 14 show how each pentagon is composed of ten 
triangles. What decomposition of the hyperbolic plane is defined by those gray lines 
alone, without the black pentagon edges? 

► 155. [M21 ] Prove that {xcj)) mod 1 = (aO)i/^ when a is the negaFibonacci code for x. 

156. [21 ] Design algorithms (a) to convert a given integer x to its negaF ibonacci 
code a ， and (b) to convert a given negaF ibonacci code a to x = N(a). 

157. [M21 ] Explain the recursion ( 148 ) for negaF ibonacci predecessor and successor. 

158. [M26] Let a = a n ... ai be the binary code for i^aO) = a nJ F n +i + ••• + aii 7 ^ 
in the standard Fibonacci number system ( 146 ). Develop methods analogous to ( 148 ) 
and ( 149 ) for incrementing and decrementing such codewords. 

159. [M34] Exercise 7 shows that it’s easy to convert between the negabinary and 
binary number systems. Discuss conversion between negaF ibonacci codewords and the 
ordinary Fibonacci codes in exercise 158. 

160. [M29] Prove that ( 150 ) and ( 151 ) yield consistent code labels for the pentagrid. 

161. [20] The cells of a chessboard can be colored black and white, so that neighboring 
cells have different colors. Does the pentagrid also have this property? 

► 162. [HM37] Explain how to draw the pentagrid. Fig. 14. What circles are present? 

163. [HM^.1 ] Devise a way to navigate through the triangles in the tiling of Fig. 18. 

164. [23] The original definition of custerization in 1957 was not ( 157 ) but 

custer^X) = X k 〜 (X NW & X N & X NE & X w & X E & X sw & X s & X SE ). 

Why is ( 157 ) preferable? 
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165* [21 ] (R. A. Kirsch.) Discuss the computation of the 3x3 cellular automaton with 

X (t+1) = custer(X (t) )= 〜 X ⑷ & \ \ \ X^). 

166. [M23] Let /(M ， 7V) be the maximum number of black pixels in an M x TV 
bitmap X for which X = custer(X). Prove that /(M, N)= 鲁 MiV + 0(M + N). 

167. [24] (Life.) If the bitmap X represents an array of cells that are either dead (0) 
or alive (1)，the Boolean function 



• • ， 尤 SE) = [2 < XNW +^N +^NE -\-Xw + \x-\-X-E -\-Xsw +^S + 尤 SE < 4] 


can lead to astonishing life histories when it governs a cellular automaton as in ( 158 ). 

a) Find a way to evaluate / with a Boolean chain of 26 steps or less. 

b) Let denote row j of X at time t. Show that can be evaluated in 

r I > * % 


at most 23 broadword steps, as a function of the three rows X) 


⑴ 


墙 . 


j 


X ] 


⑴ 


anc 


► 168. [23] To keep an image finite，we might insist that a 3 X 3 cellular automaton 
treats a M xN bitmap as a torus, wrapping around seamlessly between top and bottom 
and between left and right. The task of simulating its actions efficiently with bitwise 
operations is somewhat tricky: We want to minimize references to memory, yet each 
new pixel value depends on old values that lie on all sides. Furthermore the shifting of 
bits between neighboring words tends to be awkward, taxing the capacity of a register. 

Show that such difficulties can be surmounted by maintaining an array of n-bit 
words Ajk for 0 < j < M and 0 < A; < = \N/ (n —2)]. li j ^ M and A; # 0, word Ajk 

should contain the pixels of row j and columns (k — l)(n — 2) through k(n — 2) + 1 ， 
inclusive; the other words AMk and Ajo provide auxiliary buffer space. (Notice that 
some bits of the raster appear twice.) 

169. [22] Continuing the previous two exercises, what happens to the Cheshire cat of 
Fig. 17(a) when it is subjected to the vicissitudes of Life, in a 26 X 31 torus? 

► 170. [21 ] What result does the Guo—Hall thinning automaton produce when given a 
solid black rectangle of M rows and N columns? How long does it take? 


171. [24] Find a Boolean chain of length < 25 to evaluate the local thinning function 
p(x NW ， x N ， x NE ， x sw ,x s , x S e) of ( 159 )，with or without the extra cases in ( 160 ). 

172. [M29] Prove or disprove: If a pattern contains three black pixels that are king- 
neighbors of each other，the Guo—Hall procedure extended by ( 160 ) will reduce it, 
unless none of those pixels can be removed without destroying the connectivity. 


► 173. [M30] Raster images often need to be cleaned up if they contain noisy data. For 
example, accidental specks of black or white may well spoil the results when a thinning 
algorithm is used for optical character recognition. 

Say that a bitmap X is closed if every white pixel is part of a 2 X 2 square of 
white pixels, and open if every black pixel is part of a 2 X 2 square of black pixels. Let 

X D = & {y I y D X and y is closed}; X L = | {F | F C X and F is open}. 

A bitmap is called clean if it equals X DL for some X. We might, for example, have 

X = Jj-.; X D = J% ; X DL = J . 

In general X D is “darker” than X. while X L is “lighter ”： X D D X D X L . 

a) Prove that (X DL ) DL = X DL . Hint: X CY implies X D C Y D and X L C Y L . 

b) Show that X D can be computed with one step of a 3 X 3 cellular automaton. 
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174. [M^6] (M. Minsky and S. Papert.) Is there a three-dimensional shrinking algo¬ 
rithm that preserves connectivity, analogous to ( 161 )? 

175. [15] How many rookwise connected black components does the Cheshire cat have? 

176. [M24] Let G be the graph whose vertices are the black pixels of a given bitmap X ， 

with u - v when u and v are a king move apart. Let G 1 be the corresponding graph 

after the shrinking transformation ( 161 ) has been applied. The purpose of this exercise 
is to show that the number of connected components of G 1 is the number of components 
of G minus the number of isolated vertices of G. 

Let N (iJ) = {(i, j), (x-1, j), (i-1, j+1), (i, j+1)} be pixel (ij) together with its 
north and/or east neighbors. For each v G let S(v) = {v f G G 7 | G N v }. 

a) Prove that S(v) is empty if and only if v is isolated in G. 

b) If u — v in G. u 1 G S(u)^ and v 1 G S(v). prove that u 1 —— v 1 in G ! . 

c) For each v G G f let = \y ^ G \ v ^ N v }. Is S f always nonempty? 

d) If u — v m G'^ u S'(u')^ and v G S'(v')^ prove that u —— * r in G. 

e) Hence there’s a one-to-one correspondence between the nontrivial components 
of G and the components of G 1 • 

177. [M22] Continuing exercise 176, prove an analogous result for the white pixels. 

178. [20] If X is an M x TV bitmap, 
let X* be the M X (27V + 1) bitmap 
X J (X I (X 1)). Show that the 
kingwise connected components of 
X* are also rookwise connected, and 
that bitmap X* has the same “sur- 
roundedness tree” ( 162 ) as X. 

► 179. [34] Design an algorithm that constructs the surroundedness tree of a given 
M X N bitmap，scanning the image one row at a time as discussed in the text. (See 
( 162 ) and ( 163 ).) 

► 180. [M24] Digitize the hyperbola y 2 = x 2 -\- 13 by hand, for 0 < y < 7. 

181. [HM20] Explain how to subdivide a general conic ( 168 ) with rational coefficients 
into monotonic parts so that Algorithm T applies. 

182. [M31 ] Why does the three-register method (Algorithm T) digitize correctly? 

► 183. [M22] Find a quadratic form Q ，（ x ， y) so that, when Algorithm T is applied to 
(x\ y’ ）， (x ， y ) ， and Q\ it produces exactly the same edges as it does from (x ， y\ (x\ y ’）， 
and Q，but in the reverse order. 

► 184. [22] Design an algorithm that properly digitizes a straight line from to 

(f’， 7 /)，when T]. €’， and r/ are rational numbers, by simplifying Algorithm T. 

185. [HM22] Given three complex numbers ( 2 : 0 , 2 : 1 , 22 )，consider the curve traced out by 

B{t) = (1 — t) 2 zo + 2(1 — t)tz\ + t 2 Z 2 , for 0 < t < 1- 

a) What is the approximate behavior of B(t) when t is near 0 or 1? 

b) Let S(zo, zi : Z 2 ) = {5(t) I 0 < t < 1}. Prove that all points of S(zo, Z 2 ) lie 
on or inside the triangle whose vertices are zo, Z \, and 2 : 2 . 

c) True or false? + (^ 丄 ，… + (^ 2 ： 2 ) = w 之 1 ， 2 ： 2 ). 

d) Prove that S(zo^ Zi^ Z 2 ) is part of a straight line if and only if Z \, and Z 2 are 
collinear; otherwise it is part of a parabola. 
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e) Prove that if 0 < ^ < 1, we have the recurrence 

S(z 0 ,z 1 ,z 2 ) = S(z 0 , (l-O)zo + Ozx.BiO)) U S(B(0), (l-0)z 1 + Oz 2 ,z 2 ). 

186. [M29] Continuing exercise 185, show how to digitize S ( 2 ： o, 之 1 ，之 2 ) using the three- 
register method (Algorithm T). For best results, the digitizations of S(z 2 , >^i, >^o) and 
S(zo. Z\^Z 2 ) should produce the same edges, but in reverse order. 

► 187* [25] Bitmap images can often be viewed conveniently using pixels that are shades 
of gray instead of just black or white. Such gray levels typically are 8 -bit values that 
range from 0 (black) to 255 (white); notice that the black/white convention is tradition¬ 
ally reversed with respect to the 1-bit case. An m X n bitmap whose resolution is 600 
dots per inch corresponds nicely to the (m/ 8 ) X (n/ 8 ) grayscale image with 75 pixels 
per inch that is obtained by mapping each 8 x 8 subarray of 1 -bit pixels into the gray 
level [255(1 — k/64) 1 〆 7 + |」，where 7 = 1.3 and k is the number of Is in the subarray. 

Write an MMIX routine that converts a given m X n array BITMAP into the corre¬ 
sponding (m/8) X (n/8) image GRAYMAP, assuming that m is a multiple of 8 and that n 
is a multiple of 64. 

188. [25] Given a 64 x 64 bitmap ， what’s a good way (a) to transpose it, or (b) to 
rotate it by 90°, using operations on 64-bit numbers? 

189. [23] A parity pattern of length m and width n is an m X n matrix of Os and Is 
with the property that each element is the sum of its neighbors, mod 2. For example, 


11 

0011 
0100 
1101 ^ 
0101 

01010 

100 

110 


OHIO 

10101 

00 , 

11011 , 

101 , 

and 

11011 

11 

01010 

Oil 

001 


10101 

OHIO 


are parity patterns of sizes 3 x 2, 4 x 4 ， 3 x 5， 5 x 3 ， and 5x5. 

a) If the binary vectors a \, …， Oim are the rows of a parity pattern, show that 

a 2 , • •• ， Oirn can all be computed from the top row ai by using bitwise operations. 
Thus at most one m X n parity pattern can begin with any given bit vector. 

b) True or false: The sum (mod 2) of two m X n parity patterns is a parity pattern. 

c) A parity pattern is called perfect if it contains no all-zero row or column. For 
example, three of the matrices above are perfect, but the 3x2 and 3x5 examples 
are not. Show that every m X n parity pattern contains a perfect parity pattern 
as a submatrix. Furthermore, all such submatrices have the same size, m! X n ’， 
where rn + 1 is a divisor of m + 1 and n 7 + 1 is a divisor n + 1 . 

d) There’s a perfect parity pattern whose first row is 0011. but there is no such 

pattern beginning with 01010. Is there a simple way to decide whether a given 
binary vector is the top row of a perfect parity pattern? 厂 、 

e) Prove that there’s a unique perfect parity pattern that begins with 1 0 . •. 0. 

190. [M30] A wraparound parity pattern is analogous to the parity patterns of exer¬ 
cise 189, except that the leftmost and rightmost elements of each row are also neighbors. 

a) Find a simple relation between the parity pattern of width n that begins with a 
and the wraparound parity pattern of width 2 n + 2 that begins with 

b) The Fibonacci polynomials Fj{x) are defined by the recurrence 
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F 0 (x) = 0, Fi(x) = 1, 


and = xFj{x) + Fj-i (x) for j > 1. 
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Show that there’s a simple relation between the wraparound parity patterns that 
begin with 10... 0 (TV—1 zeros) and the Fibonacci polynomials modulo x N + 1. 
Hint: Consider Fj [x 一 1 + 1 + x). and do arithmetic mod 2 as well as mod x A + 1. 

c) If a is the binary string a\ ... a n . let fcx(x) = a\x + • • • + a n x n . Show that 

f(a 0cx R )( x ) = (/ 。 ( 尤 )+ mod (x N - 1 ) and mod 2 ， 

V 3 j J 

when TV = 2 n + 2 and aj is row j of a width-n parity pattern that begins with a. 

d) Consequently we can compute aj from a in only 0(n 2 log j) steps. Hints: See ex¬ 
ercise 4.6.3—26; and use the identity Fm+ n (x) = F rn (x)F n ^i (x) + i^m-i (x)F n {x)^ 
which generalizes Eq. 1.2. 8 -( 6 ). 

191. [HM38] The shortest parity pattern that begins with a given string can be quite 
long; for example, it turns out that the perfect pattern of width 120 whose first row is 
10 ... 0 has length 36,028.797,018,963,966(!). The purpose of this exercise is to consider 
how to calculate the interesting function 


c[q) = 1 + max{ m \ there exists a perfect parity pattern of length m and width g— 1 }， 


whose initial values (1 ， 3,4, 6 , 5, 24, 9. 12, 28) for 1 < g < 9 are easy to compute by hand. 

a) Characterize c(q) algebraically, using the Fibonacci polynomials of exercise 190. 

b) Explain how to calculate c(q) if we know a number M such that c(q) divides M, 
and if we also know the prime factors of M• 

c) Prove that c(2 e ) = 3 - 2 e ~ 1 when e > 0. Hint: has a simple form, mod 2. 

d) Prove that when q is odd and not a multiple of 3 ， c(q) is a divisor of 2 2e — 1 ， 
where e is the order of 2 modulo q. Hint: F 2 ^-i(y) has a simple form，mod 2. 

e) What happens when q is an odd multiple of 3? 

f) Finally, explain how to handle the case when q is even. 

192. [M21 ] If a perfect m X n parity pattern exists, 
when m and n are odd, show that there’s also a perfect 
(2m+l) X (2n+l) parity pattern. (Intricate fractals arise 
when this observation is applied repeatedly; for example, 
the 5x5 pattern in exercise 189 leads to Fig. 20.) 

193. [24] Find all n < 383 for which there exists a 
perfect n X n parity pattern with 8 - fold symmetry, such 
as the example in Fig. 20. Hint: The diagonal elements 
of all such patterns must be zero. 

194. [HM25] Let A be a binary matrix having rows 



Fig. 20. A perfect 
383 x 383 parity pattern. 


ai ， …， am of length n. Explain how to use bitwise operations to compute the 
rank m — r oi A over the binary field {0,1}. and to find linearly independent binary 
vectors .... 0 r of length m such that OjA = 0... 0 for 1 < j < r. Hint: See the 
“triangularization” algorithm for null spaces, Algorithm 4.6.2N. 


195* [21 ] (K. Thompson, 1992.) Integers in the range 0 < x < 2 31 can be encoded as 
a string of up to six bytes a{x) = a\ ... ai \n the following way: If x < 2 7 , set l 1 and 
ai 4 — x. Otherwise let x = (X 5 • • • xiXo) 64 ; set l 4 — 「 (Ax)/5]. 2 8 — 2 8 — z +a^—i ， and 

aj = 2 7 for 2 < jf < /. Notice that a(x) contains a zero byte if and only if x = 0. 

a) What are the encodings of # a ， # 3a3 ， # 7b97, and # 1D141? 

b) If x < x ! ^ prove that a(x) < a{x r ) in lexicographic order. 

c) Suppose a sequence of values x^x^ 2 \ . has been encoded as a byte string 

a • ••a(x( n ))，and let ak be the A:th byte in that string. Show that 
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it’s easy to determine the value x ⑴ from which ak came, by looking at a few of 
the neighboring bytes if necessary. 

196. [22] The Universal Character Set (UCS), also known as Unicode, is a standard 
mapping of characters to integer codepoints x in the range 0 < x < 2 20 + 2 16 . An 
encoding called UTF-16 represents such integers as one or two wydes (5{x) = f3\ or 
/3(x) = /3i 卢 2 , in the following way: If x < 2 16 then /3(x) = X ： otherwise 

f3i = # d800 + l_y/2 10 」and 冷 2 = # dcOO + (y mod 2 10 )，where y = x — 2 16 . 

Answer questions (a) ， (b). and (c) of exercise 195 for this encoding. 

► 197. [21 ] Unicode characters are often represented as strings of bytes using a scheme 
called UTF-8, which is the encoding of exercise 195 restricted to integers in the range 
0 < x < 2 20 +2 16 . Notice that UTF-8 efficiently preserves the standard ASCII character 
set (the codepoints with x < 2 7 ), and that it is quite different from UTF-16. 

Let ai be the first byte of a UTF-8 string a(x). Show that there are reasonably 
small integer constants a ，6 ， and c such that only four bitwise operations 

(a 》 ((ai > 6) & c)) & 3 

suffice to determine the number / — 1 of bytes between a\ and the end of a(x). 

► 198. [23] A person might try to encode # a as # c08a or # e0808a or # f080808a in 
UTF-8, because the obvious decoding algorithm produces the same result in each case. 
But such unnecessarily long forms are illegal, because they could lead to security holes. 

Suppose ai and a .2 are bytes such that a\ > # 80 and # 80 < OL 2 < # c0. Find 
a branchless way to decide whether a\ and 0 L 2 are the first two bytes of at least one 
legitimate UTF-8 string a{x). 

199. [20] Interpret the contents of register $3 after the following three MMIX instruc¬ 
tions have been executed: MOR $1 ， $0 ， #94; MXQR $2 ， $0 ， #94; SUBU $3,$2,$1. 

200. [20] Suppose x = (xis ... x\Xo)\q has sixteen hexadecimal digits. What one MMIX 
instruction will change each nonzero digit to f，while leaving zeros untouched? 

201. [20] What two instructions will change an octabyte’s nonzero wydes to # ffff ? 

202. [22] Suppose we want to convert a tetrabyte x = {xj ... x\Xq)\q to the octabyte 
y = (" 7 … yi"o) 256 ，where yj is the ASCII code for the hexadecimal digit Xj. For 
example, ii x = # 1234abcd ，y should represent the 8-character string "1234abcd". 
What clever choices of five constants a ， b ， c ， d，and e will make the following MMIX 
instructions do the job? 

MOR t ， x ， a; SLU s ， t, 4; X0R t ， s ， t; AND t ， t ， b; 

ADD t ， t ， c; MOR s ， d ， t; ADD t ， t ， e; ADD y ， t ， s. 

► 203. [22] What are the amazing constants p. q. m that achieve a perfect shuffle 
with just six MMIX commands? (See ( 175 )-( 178 ).) 

204. [20] The perfect shuffle ( 175 ) is sometimes called an “outshuffle，” by comparison 
with the “inshuffle” that takes z ^ y \ x — ("31 尤 31 • • • 尤 i"o 尤 0 ) 2 ; the outshuffle 
preserves the leftmost and rightmost bits of 2 ：， but the inshuffle has no fixed points. 
Can an inshuffle be performed as efficiently as an outshuffle? 

► 205. [23] What’s a fast way for MMIX to transpose an 8 X 8 Boolean matrix? 

► 206. [21 ] Is the suffix parity operation x® of exercise 36 easy to compute with MXQR? 
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207* [22] A puzzle: Register x contains a number 8 j + A:, where 0 < A: < 8 . Registers 
a and b contain arbitrary octabytes [aj ... aiao )256 and ( 67 … 6160 ) 256 - Find a sequence 
of four MMIX instructions that will put aj & bk into register x. 

► 208. [M25] The truth table of a Boolean function f(xi， … . xq) is essentially a 64-bit 
number 

/ = (/(0,0,0,0,0,0).../(l,l ， 1,1,1,0)/(1,1,1,1 ， 1,1)) 2 . 

A 

Show that two MOR instructions will convert / to the truth table of /， the least monotone 
Boolean function that is greater than or equal to / at each point. 

209. [M32] Suppose a = (a 63 … ^ 1 ^ 0)2 represents the polynomial 

a(x) = (a 6 3 ... aiao) x = a 63 x H - h a\x + a 0 . 

Discuss using MX OR to compute the product c{x) = a{x)b{x), modulo x 64 and mod 2. 

► 210. [HM26] Implement the CRC procedure ( 183 ) on MMIX. 

► 211. [HM28] (R. W. Gosper.) Find a short，branchless MMIX computation that will 
compute the inverse of any given 8 x 8 matrix X of Os and Is, modulo 2, assuming that 
detX is odd. 
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SECTION 7.1.3 

1. These operations interchange the bits of x and y in positions where m is 1. (In 

particular, if m = — 1 . the step y ㊉ (x & m)’ becomes just ‘y y ® x’ ， and the 

three assignments will swap x ^ y without needing an auxiliary register. H. S. Warren. 
Jr” has located this trick in vintage-1961 IBM programming course notes.) 

2. All three hold when x and y are nonnegative, or if we regard x and y as “unsigned 
2-adic integers” in which 0 < 1 < 2 < ••• < —3 < —2 < —1. But if negative integers 
are less than nonnegative integers, (i) fails if and only if x < 0 and y < 0 ; (ii) and (iii) 
fail if and only if x ㊉ y < 0 , namely, if and only if x < 0 and y > 0 or x > 0 and y < 0. 

3. Note that x — y = {x ^ y) — 2{x & y) (see exercise 93). By removing bits common 
to x and y at the left, we may assume that Xn-\ = 1 and Vn-i = 0. Then 2(x y) < 

2 ((， e,)- 2 ^) = tx 0 y)-(xe^-i. 

4. x CN =x-^l = x s ,by ( 16 ). Hence x NC = x NCSP = x NCCNP = x NNP = x p . 

5. (a) Disproof: Let x = (• …尤 2 尤 1 尤 o) 2 . Then digit l oi x is xi^k [I > k]. So digit 

l of the left-hand side is xi-k-j [I — k> j], while digit l of the right-hand side is 

xi^j^k [I > J + k]- These expressions agree if J > 0 or A: < 0. But if j < 0 < A:, they 
differ when l = max( 0 . j + k) and xi—j_k = 1. 

(We do, however, have (x 《 j) 《 k C x (j + A:) in all cases.) 

(b) Proof: Digit l in all three formulas is xi^j [l > —j] A yi^k[l > k]- 

6. Since x ^ y > 0 if and only if x > 0, we must have x > 0 if and only if y > 0. 

Obviously x = y is always a solution. The solutions with x > y are (a) x = —1 and 
y = — 2 , or 2 y > x > y > 0 ; (b) x = 2 and y = 1. or 2~~ x > —y > —x > 0 . 

7. Set x ^ {x /io) ® jlo, where /io is the constant in ( 47 ). Then x’ =( … x , 2 x , 1 Xq) 2 ^ 
since (x 7 ®/xo) —/io = (… x 3 x , 2 x 1 Xq )2 — (… 1010)2 =( … ^x , 2 ^x , Q )2 —( … ^ 30 x / 1 0)2 = x. 

[This is Hack 128 in HAKMEM; see answer 20 below. An alternative formula, 
x <— (/io — 尤)®/^。， has also been suggested by D. P. Agrawal. IEEE Trans. C-29 (1980)， 
1032—1035. The results are correct modulo 2 n for all n，but overflow or underflow can 
occur. For example, two’s complement binary numbers in an n-bit register range from 
— 2 n_1 to 2 n _i — 1 ， inclusive，but negabinary numbers range from — |( 2 n — 1 ) to 
I (2 n — 1) when n is even. In general the formula x (x + /i) ® /i converts from 
binary notation to the general number system with binary basis ( 2 n (—l) mn ) discussed 
in exercise 4.1—30(c)，when [i = (… m 2 mimo) 2 .] 

8 . First, : r ㊉ y 拿 (*S ㊉ y)U(x®T). Second, suppose that 0 g < o; ㊉ y，and let = 
(alc/) 2 ，A: = (a 0 c/’） 2 ，where a ， c/，and ol" are strings of 0s and Is with |c/| = \o^ n \. 
Assume by symmetry that x = (/31/3,)2 and y = ( 707 ,) 2 ，where \a\ = |/3| = | 7 |. Then 
念㊉以 =(/307")2 is less than x. Hence k ㊉ y G S, and k = (A; ㊉ y) ㊉ y G 5 ㊉ y. [See R. P. 
Sprague, Tohoku Math. J. 41 (1936)，438-444; P. M. Grundy, Eureka 2 (1939), 6-8.] 

9. The Sprague—Grundy theorem in the previous exercise shows that two piles of x 
and y sticks are equivalent in play to a single pile of x^y sticks. (There is a nonnegative 
integer k < x ® y if and only if there either is a nonnegative i 〈 x with i®y<x®yor 
a nonnegative j < y with x ㊉ j. < x ㊉ y.) So the k piles are equivalent to a single pile 
of size ai ® • • • ® afc- [See C. L. Bouton, Annals of Math. (2) 3 (1901—1902)，35—39.] 

10. For clarity and brevity we shall write simply xy for x ® y and x y for x ㊉ y ， in 
parts (i) through (iv) of this answer only. 

(i) Clearly Oy = 0 and x y = y x and xy = yx. Also ly = by induction on y. 

(ii) Ii x ^ x 1 and y ♦ y’ then xy + xy 1 + x 1 y + x , y , ^ 0, because the definition of 
xy says that xy’ + x y + x y < xy when x < x and y 1 < y. In particular, ii x 0 and 
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y ^ 0 then xy ^ 0. Another consequence is that, if x = mex(*S) and y = mex(T) for 
arbitrary finite sets S and T. we have xy = mexjxjf + iy + ij | < G 5"，jf G T 1 }. 

(iii) Consequently, by induction on the (ordinary) sum of y. and 2：, (x + y)z is 

mex{(x + y)z + [x + y)z + {x 1 + y)z , (x + y)z + (x + y’）z + (x + y)z 

0 <x' <x, 0 <y < y, 0 < z 1 < zj, 

which is mex{xz f + x z x z + yz, xz + yz + y z + y z} = xz + yz. In particular, 
there’s a cancellation law: If xz = yz then {x + y)z = 0, so x = y or 2 ： = 0. 

(iv) By a similar induction, (xy)z = meyi{(xy)z + (xy 1 + x y + x , y , )(z + z ; )}= 
mex{(xy)z , + (xy)z + (xy’ ） z’ + •••} = mex{x(yz) + x(yz) + x(yz f ) + •••} = 
mex{(x + x)(yz + yz + yz) + x 1 {yz)} = x(yz). 

(v) If 0 < x, y < 2 2Tl we shall prove that x ® y < 2 2 ' 2 2 几⑭ y = 2 2 、， and 

2 2T1 (g) 2 2Tl = |2 2?1 . By the distributive law (iii) it suffices to consider the case x = 2 a 

and y = 2 b for 0 < a, 6 < 2 n . Let a = 2 P a and b = 2 q b\ where 0 < < 2 P and 

0 < b f < 2 q ; then x = 2 2P (g) 2 a， and y = 2 29 (8) 2 b， . by induction on n. 

If p < n — 1 and q < n— 1 weVe already proved that x^y < 2 2?1_1 . If p < q = n — 1 ， 

then x (8 ： 2 b， < 2 2 ' hence x (S) y < 2 2Tl . And ifp=q = n — 1, we have x <S) y = 
2 2P <S> 2 2P (g) 2 a， (g) 2 b， = (|2 2P ) (8) 2：, where 2 ： < 2 2P • Thus x ^ y < 2 2?1 in all cases. 

By the cancellation law，the nonnegative integers less than 2 2 几 form a subfield. 
Hence in the formula 


2 2 " 1 (g) y = mex{2 2Tl y ， ㊉ ㊉ y’）| 0 < x’ < 2 2 ' 0 < y < y} 

we can choose x for each y r to exclude all numbers between 2 2Tl y , and 2 2Tl {y f + 1) — 1 ； 
but 2 2Tl y is never excluded. 

Finally in 2 271 ② 2 2 几 =mex{2 2T1 (V ㊉ y ’） ㊉ (V ③ y’) \ 0 < x , y < 2 2?2 }. choosing 
x = y 1 will exclude all numbers up to and including 2 271 — 1， since x ③ x = y ③ y 
implies that (x ㊉ y) ③ （x ㊉ y) = 0, hence x = y. Choosing x = y excludes numbers 
from 2 2?1 to |2 2?1 — 1, since (x0x) ㊉； r = {y^y)®y implies that x = = 2 /㊉ 1， and 

since the most significant bit of x (8 ： x is the same as that of x. This same observation 
shows that ^2 2 is not excluded. QED. 

Consider, for example, the subfield {0, 1...., 15}. By the distributive law we can 
reduce x^y to a sum of x0 1, x02, x(8 ： 4. and/or x08. We have 202 = 3, 204 = 8, 
4 (g) 4 = 6; and multiplication by 8 can be done by multiplying first by 2 and then by 4 
or vice versa, because 8 = 2 ③ 4. Thus 2 0 8= 12, 4(8)8 = 11,8(8)8 = 13. 

In general, for n > 0 ， let n = 2 m + r where 0 < r < 2 n . There is a 2 m+1 X 2 m+1 
matrix Q n such that multiplication by 2 n is equivalent to applying Q n to blocks of 
2 rnJrl bits and working mod 2. For example, Qi = (2) ， and (… X4X3X2X1 尤 o)2 ⑭ 2 1 = 
(•••y4y3U2yiyo)2, where y 0 = xi, ㊉ x 0 , = x 3 ^ y3 = x 3 ㊉ 尤 2, 2/4 = 尤 5, etc. 

The matrices are formed recursively as follows : Let Qo = Ro = (1) and 


Q2 m 







=Q 2 rri + 1 — 1 ， 


where Q r is replicated enough times to make 2 m+1 rows and columns. For example 


Q2 = 


/I 0 1 1\ 

0110 
1000 
\0 1 0 0 / 


Q3 = Q2 




/I 1 0 1\ 
10 11 
110 0 
\1 0 0 0 / 


=i? 2 . 
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If register x holds any 64-bit number, and if 1 < j < 7, the MMIX instruction MXOR y ， q)，x 
will compute y = x ③ 2 J ，given the hexadecimal matrix constants 


q 1 = c08030200c080302, 
q 2 = b06080400b060804, 
q 3 = d0b0c0800d0b0c08, 


q 4 = 8d4b2cl880402010, 
q 5 = c68d342cc0803020, 


q 6 = b9678d4bb0608040, 
q 7 = deb9c68dd0b0c080. 


For further information, see J. H. Conway, On Numbers and Games (1976)，Chapter 6 , 
where it is shown that these definitions actually yield an algebraically closed field over 
the ordinal numbers.] 

11 . Let m = 2 as + • • • + 2 ai with a 5 > ••- > ai > 0 and n = 2 6t + • • • + 2 bl with 
bt > ••• > bi 2 0. Then m ⑭ n = mn if and only if (a s | … | 如 ) & ( 6 * | … | 6 i) = 0. 

12 . If x = 2 2 a -\-b where 0 < a, 6 < 2 2 ， let x’ = r 0 (x ㊉ a). Then 

X = ((2 2 ( 8 ： a) ®6) (g) ((2 2 0 a) © a ㊉ 6) = (2 2 ― 1 0 a (g: a) ㊉ (6 0 (a © 6)) < 2 2 • 


To nim - divide by x we can therefore nim-divide by x and multiply by x 0 a. [This algo¬ 
rithm is due to H. W. Lenstra, Jr.; see Seminaire de Theorie des Nombres (Universite 
de Bordeaux, 1977—1978), expose 11， exercise 5.] 

13. If a 2 ㊉…㊉ afc = ㊉ 03 ㊉…㊉ （ (A: — 2 ) 0 afc) = 0, every move breaks this condition; 

we can’t have (a ⑭ ; r) ㊉ （ 6 ⑭ y) = (a ( 8 ： x 1 ) ㊉ （ 6 ③ y’）when a + h unless (x. y) = (x\ y’）. 

Conversely, if a 2 ㊉…㊉ afc # 0 we can reduce some aj with j > 2 to make this 
sum zero; then a\ can be set to a 3 ㊉…㊉ ((k — 2 ) ㊈ a^). If a 2 ㊉…㊉ flfc =0 and 
ai # ㊉…㊉ （(^ — 2 ) ③ afc)，we simply reduce ai if it is too large. Otherwise there’s a 

jf > 3 such that equality will occur if (jf — 2)0 aj is replaced by an appropriate smaller 
value ((j — 2 ) ⑭ a;) ㊉ （(< — 2 ) ③ （aj ㊉ a;))，for some 2 < z < j and 0 < a》< aj，because 
of the definition of nim multiplication; hence both of the desired equalities are achieved 
by setting aj a f j and ai 4— ai ® aj ㊉ a;、[This game was introduced in Winning Ways 
by Berlekamp, Conway, and Guy, at the end of Chapter 14.] 

14. (a) Each y = [ ••• y 2 yiyo )2 = x T determines x = (… X 2 X 1 尤 0)2 uniquely, since 
zo = yo ㊉ t and \_y/2\ = \_x/2\ T ^o. 

(b) When A: > 0， it is a branching function with labels = a for |/3| = A: — 1 ， 
and to- = 0 for \a\ < k. But when A: < 0. the mapping is not a permutation; in fact, it 
sends infinitely many 2 -adic integers into 0 . 

[The case A: = 1 is particularly interesting: Then x T takes nonnegative integers 
into nonnegative integers of even parity, negative integers into nonnegative integers of 
odd parity，and —1/3 ^ —1. Furthermore L$ T /2」is “Gray binary code，” 7.2.1.1—( 9 ).] 

(c) If p(x 0 y) = A: we have T{x) = T(y) and x = y -\- 2 k (modulo 2 k+1 ). Hence 
p{x T ® y T ) = p(x ㊉ y ㊉ T(x) ㊉ T(y)) = A:. Conversely, if p(x T 0 y T ) = k whenever y = 
x-\-2 k . we obtain a suitable bit labeling by letting = (x T 》 \a\) mod 2 when x= ( 0 ^) 2 . 

(d) This statement follows immediately from (a) and (c). For if we always have 

p(x ® y) = p(x u ^ y u ) = y v )^ then p(x 0 y) = p{x u ® y u ) = p{x uv ® y uv ). And 

if x TU = x for all x, ㊉ y u ) = p(x ㊉ y) is equivalent to p(x ㊉ y) = /?( 尤了 ㊉ y T ). 

We can also construct the labelings explicitly: IiW= UV, note that when a ， 6 , c G 
{ 0 ， 1 } we have W a = U a V a ' ， W a b = U a bV a f b f , and W a bc = UabcVa'b'c ' ， where a’ = a ㊉ 从， 
b’ = b ㊉ u a ， c’ 二 c® and so on; hence w = u ㊉ v, w a = Wa ©rY, 川 ab = u ab 
etc. The labeling T inverse to U is obtained by swapping left and right subtrees of all 
nodes labeled 1 ; thus t = u ， t a = u a >, t a b = u a ， y 、 etc. 

(e) The explicit constructions in (d) demonstrate that the balance condition is 

preserved by compositions and inverses, because = { 0 , 1 } at each level. 
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Notes: Hendrik Lenstra observes that branching functions can profitably be viewed 
as the isometries (distance-preserving permutations) of the 2 -adic integers, when we 
use the formula \ to define the “distance” between 2 -adic integers x and y. 

Moreover, the branching functions mod 2 d turn out to be the Sylow 2-subgroup of the 
group of all permutations of { 0 , 1 ， … ， 2 d_1 } 5 namely the unique (up to isomorphism) 
subgroup that has maximum even order among all subgroups of that group. They also 
are equivalent to the automorphisms of the complete binary tree with 2 d leaves. 

15 . Equivalently, (x+2a)®6 = (x®6)+2a; so we might as well find all b and c such that 
(x ㊉ 6 ) + c = (x + c) ㊉ 6 . By ( 89 ) ， the latter is equivalent to (x ㊉ 6 ㊉ c) + 2( ㊉ 6 ) & c)= 
( 工 ㊉ c ㊉ 6 ) + 2(x & c)，so the condition 6 & c = 0 is necessary and sufficient. Thus the 
condition (a 《 l )&6 = 0 is necessary and sufficient for the original problem. 

16 . (a) If p{x 0 y) = A: we have x = y -\-2 k (modulo 2 fc+1 ); hence x + a 三 y + a + 2 fc 
and p((x + a) ㊉ (y + a)) = &• And p((x ㊉ 6 ) ㊉ (y ㊉ 6 )) is obviously k. 

(b) The hinted labeling, call it P(c), has Is on the path corresponding to c. and 
Os elsewhere; thus it is balanced. The general animating function can be written 

/(CO)，％) — '.. ， — 广 〜㊉ c 叫 where = 6l ㊉…㊉ 

so it is balanced if and only if c m = 0 . 

[Incidentally，the set S = {/^(O)} U {_P(&) ㊉ + 2 e ) | A: > 0 and 2 e > k} provides 
an interesting basis for all possible balanced labelings: A labeling is balanced if and 
only if it is ㊉ {g I g G Q} for some Q G S". This exclusive or is well defined even 
though Q might be infinite, because only finitely many Is appear at each node. 

(c) The function P(c) in (b) has this form, because x p ^ = x ® [x ® c]. Its 

inverse, ； r s ( c ) = ((x ® c) + 1) ® c，is b ㊉ 3 = x p( ^\ Furthermore we have 

x P{c)P{d) — x P(c) 0 ^P(c) 0 ^ = 尤 ㊉ 卜 ㊉ c] ㊉ [_ 尤 ㊉ d s ( c )]，because 尤 ㊉ W = 卜了 ㊉ y T "l for any 

branching function x T . Similarly x p ( c ) p ⑷ p ( e ) = a :㊉ 卜 ㊉ d ㊉ 卜 ㊉ d s ( c )l ㊉ 卜 ㊉ eS ( d ) s ( c )l ， 
etc. After discarding equal terms we obtain the desired form. The resulting numbers 
Pj are unique because they are the only values of x at which the function changes sign. 

(d) We have, for example, t ㊉ 卜 ㊉ a"] ㊉ L x ㊉ 纠 ㊉ L x ㊉ c "l = x p( ^ a ) p ( 6 ) p ( c ) where 
a = a, b 1 = b p ( a， \ and c = 6 - p ( a， )- p ( &， ). 

[The theory of animating functions was developed by J. H. Conway in Chapter 13 
of his book On Numbers and Games (1976)，inspired by previous work of C. P. Welter 
in Indagationes Math. 14 (1952), 304-314; 16 (1954) ， 194-200. 

17. (Solution by M. Slanina.) Such equations are decidable even if we also allow opera¬ 

tions such as xSzy, x<^l. x 》 l ， 2 px ，and 2 Xx , and even if we allow Boolean combina¬ 
tions of statements and quantifications over integer variables, by translating them into 
formulas of second-order monadic logic with one successor (SIS). Each 2-adic variable 
x = (… X 2 XiXo )2 corresponds to an SIS set variable X， where j G X means Xj = 1: 

z = x becomes \/t(t G Z t 穿 X); 

z = x y becomes Wt(t ^ Z ^ (t X A t Y))] 

z = 2 px becomes \/t(t G Z O (t G X 八 Vs(s <. t ^ s ^ X))); 

z = x y becomes 彐 6 ^( 0 拿 （7 八(尤 G Z 公 (tGX) ㊉ (tGY") ㊉( 拓 (7)) 

A (t+1 G ((tex)(teY)(teC)))). 

An identity such as x & (—x) = 2 px is equivalent to the translation of 

VXVFVZ((integer(X) A 0 = x y A z = x k,y) z = 2 px ), 
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where integer (X) stands for 彐 tVs(s X)). We can include rational 

2-adic constants as well; for example, z = fio is equivalent to 0 G ^ A \/t(t Z <=> 
t 1 ^ Z). But of course we cannot include arbitrary (uncomputable) constants. 

J. R. Biichi proved that all formulas of SIS are decidable, in Logic ， Methodology, 
and Philosophy of Science: Proceedings (Stanford, 1960) ， 1—11. If we restrict attention 
to equations, one can show in fact that exponential time suffices. 

On the other hand M. Hamburg has shown that the problem would be unsolvable 
if px, Ax，or 1were added to the repertoire; multiplication could then be encoded. 

Incidentally, many nontrivial identities exist. even if we use only the operations 
x ㊉ y and x + 1. For example, C. P. Welter noticed in 1952 that 


((^0(y +1)) + 1) 0(^ + 1) = (((Or + l)®y) + l)®a;) + l- 


18 * Of course row x is entirely blank when x is a multiple of 64. The fine details 
of this image are apparently “chaotic” and complex, but there is a fairly easy way to 
understand what happens near the points where the straight lines x = intersect 

the hyperbolas xy = 2 11 k. for integers j : k > 1 that aren’t too large. 

Indeed, when x and y are integers, the value of 》 11 is odd if and only if 
x 2 y/2 12 mod 1 > |. Thus，if x = 64 + S and xy = 2 11 {k + e) we have 




and this quantity has a known relation to \ when, say, S is close to a small integer. 
See C. A. Pickover and A. Lakhtakia, J. Recreational Math. 21 (1989) ， 166-169.] 


19. (a) When n = 1 ， f(A. B, C) has the same value under all arrangements except 
when ao 7^ ai, 6o ^ &i, and Co ^ ci ； and then it cannot exceed 1. For larger values of n 
we argue by induction，assuming that n = 3 in order to avoid cumbersome notation. Let 

A 0 = (a 0 ， ai ， a2 ， a3)，= (a4, «7), • • • ， Ci = (c 4 , c 5 , c 6 , c 7 ). Then f(A.B,C) = 

Sj®fc©z=o ㊉ z=0 by induction. Thus we can assume 

that ao > ai > >03,04 > > 07, ..., C4 > C5 > cq > C7. We can also 

sort the subvectors Aq = (ao, ai, 04, 05) ， A[ = ( 奶， a3, a6, a7 )，.... C[ = (C2, C3, C6, C7) 
in a similar way. Finally，we can sort Aq = (ao, ai ， a6, a7) ， A'l = (a2, a3, 似，你），…， 
Ci = (C2, C3, C4, C5), because in each term a^b^ci the number of subscripts {j, /} with 

leading bits 01 ， 10 ， and 11 must satisfy soi 三 sio 三 sn (modulo 2). And these three 
sorting operations leave B ， C fully sorted, by exercise 5.3.4—48. 

(b) Suppose A = A*, B = B *, and C = C *. Then we have aj = 1 a Aj ^ 亡 ], 

where aj = aj — a^+i > 0 and we set a 2 ^ = 0; similar formulas hold for bk and ci. Let 
_A( p ) denote the vector (a p (o )， …， ^p(2 ri -i)) when p is a permutation of {0,1, .... 2 n — 1}. 
Then by part (a) we have 


f(A iph B iqh C {r) ) = E i0 fc©z=o [p(j) < t] [q(k) < u][r(l) < v] 

< E i0 fe©z=o J2t,u,v = f(A,B.C). 

[This proof is due to Hardy, Little wood, and Polya, Inequalities (1934) ， §10.3.] 

(c) The same proof technique extends to any number of vectors. [R. E. A. C. 
Paley, Proc. London Math. Soc. (2) 34 (1932) ， 263—279, Theorem 15.] 

20 . The given steps compute the least integer y greater than x such that uy = vx. 
They’re useful for generating all combinations of n objects，taken m at a time (that is ， 
all m- element subsets of an n- element set，with elements represented by 1 bits). 

[This tidbit is Hack 175 in HAKMEM，Massachusetts Institute of Technology 
Artificial Intelligence Laboratory Memo No. 239 (29 February 1972).] 
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21 . Set tty + l’u^r-i^y’v tt&y,x tv — (V& —v)/(u + 1). If y = 2 m — 1 is 
the first m- comb in at ion, these eight operations set x to zero. (The fact that x = f(y) 
does not seem to yield any shorter scheme.) 

22. Sideways addition avoids the division: SUBU t ， x ， l; ANDN u ， x ， t; SADD k ， t ， x; 
ADDUv,x ， u; XOR t,v,x; ADDU k,k,2; SRUt ， t,k; ADDU y ， v ， t. But we can actually 
save a step by judiciously using the constant mone = —1: SUBU t ， x ， l; XOR u ， t ， x; 
ADDU y ， x ， u; SADD k ， t ， y; ANDN y ， y ， u; SLU t ， mone ， k; ORN y ， y ， t. 

23 . (a) (0 • • • 01 • • • 1)2 = 2 m - 1 and (0101 … 01)2 = (2 2m - 1)/3. 

(b) This solution uses the 2-adic constant =( … 010101)2 = —1/3: 

V 

t 尤 ㊉ /io ， U (t—1) ㊉ t ， V X U, W V 1^ y ^ W -\ - • 

' ’ ’ + 1 
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If x = (2 2m — 1)/3, the operations produce a strange result because u = 2 2m+1 — 1. 

(c) XOR t ， x ， m0; SUBU u ， t ， 1; XOR u ， t ， u; 0Rv ， x ， u; SADD y ， u ， mO; ADDU w ， v ， 1; 
ANDN t ， v ， w; SRU y ， t ， y; ADDU y ， w ， y. [This exercise was inspired by Jorg Arndt.] 

24. It’s expedient to “prime the pump” by initializing the array to the state that it 
should have after all multiples of 3 ， 5, 7， and 11 have been sieved out. We can combine 
3 with 11 and 5 with 7, as suggested by E. Wada: 


L0C Data_Segment 

qbase GREG @ ;N IS 3584 ;n GREG N ;one GREG 1 
Q QCTA #816dl29a64b4cb6e Q 0 (little-endian) 

LQC Q+N/16 

qtop GREG @ End of the Q table 

Init QCTA #9249249249249249 | #4008010020040080 Multiples of 3 or 11 in [129.. 255] 
OCT A #8421084210842108 I #0408102040810204 Multiples of 5 or 7 
t IS $255 ;x33 IS $0 ;x35 IS $1 ;j IS $4 
LQC #100 

Main LDQU x33 ， Init; LDQU x35 ， Init+8 

LDA j ， qbase, 8 ; SUB j ， j ， qtop Prepare to set Q\. 

1H NOR t ， x33 ， x33; ANDN t ， t ， x35 

ST0U t ， qtop，j Initialize 64 sieve bits. 

SLU t ， x33,2; SRU x33 ， x33,31; OR x33,x33 3 tPrepare for the next 64 values. 
SLU t ， x35,6; SRU x35,x35,29; OR x35 ， x35，t 

ADD j ， j ， 8 ; PBN j ， 1B Repeat until reaching qtop. | 

Then we cast out nonprimes p 2 , p 2 + 2p, …， for p = 13 ， 17，…， until p 2 > N: 


p IS $0 ;pp IS $1 ;m IS $2 ;mm IS $3 ;q IS $4 ;s IS $5 
LDQU q,qbase,0; LDA pp,qbase ,8 

SET p ， 13; NEG m,13*13,n; SRU q ， q ，6 Begin with p = 13. 

1H SR m ， m，l ml [(p 2 - N)/2\. 

2H SR LD0U s ， qtop，mm 

AND t ， m ， #3f; SLU t ， one，t 


ANDN s ， s ， t; ST0U s,qtop,mm 
ADD m ， m ， p; PBN m ， 2B 
SRU q ， q ， l; PBNZ q,3F 
2H LDQU q ， pp ， 0; INCL pp ，8 
OR p ， p ， #7f; PBNZ q ， 3F 
ADD p ， p ， 2; JMP 2B 


Zero out a bit. 

Advance by p bits. 

Move to next potential prime. 
Read in another batch 
of potential primes. 
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2H SRU q ， q，l 

3H ADD p ， p ， 2; PBEV q ， 2B Set p p + 2 until p is prime. 

MUL m ， p ， p; SUB m ， m ， n; PBN m, IB Repeat until p 2 > N. | 

The running time, 1172/i + 5166i;，is of course much less than the time needed for steps 
P1-P8 of Program 1.3.2"P, namely 10037/i + 641543u (improved to 10096/i + 215351i; 
in exercise 1.3.2’—14). [See P. Pritchard, Science of Computer Programming 9 (1987), 
17—35，for several instructive variations. In practice, a program like this one tends 
to slow down dramatically when the sieve is too big for the computer’s cache. Better 
results are obtained by working with a segmented sieve, which contains bits for numbers 
between No + kS and No + (A; + 1)S, as suggested by L. J. Lander and T. R. Parkin, 
Math. Comp. 21 (1967), 483-488; C. Bays and R. H. Hudson, BIT 17 (1977), 121-127. 

Here Nq can be quite large. but S is limited by the cache size; calculations are done 
separately for A: = 0 ， 1， …. Segmented sieves have become highly developed; see，for 
example, T. R. Nicely, Math. Comp. 68 (1999) ， 1311—1315，and the references cited 
there. The author used such a program in 2006 to discover an unusually large gap of 
length 1370 between 418032645936712127 and the next larger prime.] 

25. (1 + 1 + 25 + 1 + 1+ 25 + 1 + 1 = 56) mm; the worm never sees pages 2-500 of 
Volume 1 or 1—499 of Volume 4. (Unless the books have been placed in little-endian 
fashion on the bookshelf; then the answer would be 106 mm.) This classic brain-teaser 
can be found in Sam Loyd’s Cyclopedia (New York: 1914)，pages 327 and 383. 

26. We could multiply by # aa...ab instead of dividing by 12 (see exercise 1.3.1^—17); but 
multiplication is slow too. Instead we can use a scheme that is neither big-endian nor 
little-endian but transposed: Put item k into octabyte 8(k mod 2 20 )，where it is shifted 
left by 5 [k/2 20 \. Since k < 12000000， the amount of shift is always less than 60. 
The MMIX code to put item k into register $1 is AND $0 ， k ， [#fffff] ; SLU $0 ， $0,3; 
LD0U $l ， base ， $0; SRU $0 ， k ， 20; 4ADDU $0 ， $0 ， $0; SRU $1,$1 3 $0; AND $l,$l 3 #lf. 

[This solution uses 8 large megabytes (2 23 bytes). Any convenient scheme for 
converting item numbers to octabyte addresses and shift amounts will work，as long as 
the same method is used consistently. Of course, just ‘LDBU $1 ， base ， k’ would be faster. 

27. (a) ((x—1) ® x) + x. [This exercise is based on an idea of Luther Woodrum, who 
noticed that ((x— 1 ) | x) + 1 = (x & —x) + x. 

(b) (y + x)\ y, where y = (x— 1 ) ㊉ ； r. 

(c ， d ， e) ((y © x) + x) & y, ((y ㊉ x) + x) ㊉ y，and ((y ㊉ x) + x) & y，where y = x-1. 
(f) x ㊉ (a); alternatively, y ㊉ (y+1)，where y = x \ (x—1). [The number ( 0 °° 01 a ll 6 )2 
looks simpler, but it apparently requires five operations: ((y + 1 ) & 安） 一 1 .] 

28. A 1 bit indicates x^s rightmost 0 (for example, ( 101011)2 4 ( 000100 ) 2 ); —1 ^ 0. 

29. /ifc = /ifc+i ® (/ifc+i 《 2 fc ) [see STOC 6 (1974) ， 125]. This relation holds also for 
the constants fid : k of ( 48 )，when 0 < A; < (i. if we start with /M'd = 2 2 ^ — 1. (There is ， 
however, no easy way to go from \i^ to /ifc+i，unless we use the “zip” operation; see ( 77 ).) 

30. Append ‘CSZ rho ， x ， 64’ to ( 50 ). thereby adding lv to its execution time. For ( 51 )， 
we simply need to make sure that rhotab[0] = 8 . 

31. In the first place，his code loops forever when x = 0. But even after that bug is 
patched，his assumption that x is a random integer is highly questionable. In most 
applications when we want to compute px for a nonzero 64-bit number x : a more 
reasonable assumption would be that each of the outcomes {0 ， 1, … ， 63} is equally 
likely. The average and standard deviation then become 31.5 and ^ 18.5. 
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32. C NEGU y ， x; AND y ， x ， y; MULU y ， debruijn ， y; SRU y ， y ， 58; LDB rho ， decode ， y” has 
estimated cost /x + 14u，although multiplication by a power of 2 might well be faster 
than a typical multiplication. Add lv for the correction in answer 30. 

33. In fact，an exhaustive calculation shows that exactly 94727 suitable constants a 
yield a “perfect hash function” for this problem, 90970 of which also identify the power - 
of-two cases y = 2^: 90918 of those also distinguish the case y = 0. The multiplier 
# 208b2430c8c82129 is uniquely best，in the sense that it doesn’t need to refer to table 
entries above decode [32400] when y is known to be a valid input. 

34. Proof: If px = py we have either x = y = 0 or x = al0 k and y = /310 fc ; hence 
尤 ㊉ y = (a ㊉ 卢 )01 fc = (x — 1) ㊉ （y — 1). li px > py = k we have (x ㊉ y) mod 2 k+2 ^ 
((x-1 ) ㊉ (y—l)) mod 2 fc+2 . 

35. Let f[x) = t ㊉ Clearly f(2x) = 2f(x), and f(4x + 1) = 4f(x) + 2. We also 
have f(4x — 1) = 4f(x) + 2, by exercise 34. The hinted identity follows. 

Given n，set u n 》 1， v t u n，t t u ㊉ v, n + r & t，and n_ u Sz t. 
Clearly u = L n /2」and v = [3n/2」 ， so n+ — n 一 = v — u = n. And this is Reitwiesner’s 
representation ? because n+ |n_ has no consecutive Is. [H. Prodinger, Integers 0 (2000 )， 
paper a 8 , 14 pp. Incidentally we also have f(—x) = f(x).] 

36* (i) The commands x 

z ㊉ (x 《 16)，x 1 ㊉ （ x 《 32) change x to x®. (ii) = ((x + 1 ) & x) — 1 . 

(See exercises 66 ， 70， and 117 for applications of x®; see also exercise 206.) 

37. Insert ‘CSZ y ， x，half ’ after the FLQTU in ( 55 )，where half = # 3f eOOOOOOOOOOOOO; 
note that ( 55 ) says C SR’ (not ‘SRU’）. No change is needed to ( 56 )，if lamtab[0] = —1. 

38. ^ SRU t ,x,l; OR y,x,t; SRU t,y,2; OR y 3 y 3 t; SEU t,y,4; OR y,y,t; •••; 
SRU t ， y ， 32; OR y ， y ， t; SRU y ， y ， l; CMPU t ， x ， 0; AD DU y ， y ， t’ takes 15v. 

39* (Solution by H. S. Warren, Jr.) Let cr(x) denote the result of smearing x to the 
right, as in the first line of ( 57 ). Compute x & <j[(x 》 1) & x). 

40. Suppose Xx = Xy = k. If x = y = 0, ( 58 ) certainly holds，regardless of how we 
define A0. Otherwise x = (la )2 and y = ( 1 / 3 ) 2 ，for some binary strings a and /3 with 
|q/| = |^| = k; and 义 < 2 fc < x Sz y. On the other hand if Xx < Ay = A:，we have 
x ^y>2 k > x&:y. And H. S. Warren. Jr” notes that Xx < Xy if and only if x < y Szx. 

41. (a) = E fc Wl - /)= 之 /(1 - 之） -E fc W(l + /). The 

Dirichlet generating function is simpler: J^^ 1 (pn)/n 2 = ((z) / (2 Z — 1). 

(b) En=l( An ) zU = EZl Z 2k /(1~Z). ^ 

(c) E= 1 (— 广 = - z )0- + z2k )) = E=o z 2k hk(z), where fik(z)= 
(1 + 2 + ... + 2 2fc_1 )/(l — z 2k+1 ). (The “magic masks” of ( 47 ) correspond to /xa ； (2).) 

See Automatic Sequences by J.-P. Allouche and J. Shallit (2003)，Chapter 3, for 
further information about the functions p and v, which they denote by "2 and S 2 .] 

42. ei2 ei - 1 + (e 2 +2)2 e2 - 1 + ••• + (e r + 2r — 2)2 er_1 , by induction on r. [D. E. Knuth ， 
Proc. IFIP Congress (1971) ， 1 ， 19-27. The fractal aspects of this sum are illustrated 
in Figs. 3.1 and 3.2 of the book by Allouche and Shallit.] 

43. The straightforward implementation of ( 63 )，‘SET mi ， 0; SET y ， x; BZ y ， Done; 
1H ADD nu ， nu ， 1; SUBU t ， y ， 1; AND y ,y ,t; PBNZ y, 1B , costs (5 + \vx)v\ it beats the 
implementation of ( 62 ) when vx < 4, ties when vx = 4, and loses when vx > 4. 

But we can save 4：v from the implementation of ( 62 ) if we replace the final 
multiplication-and-shift by L y y + (y 》 8 )，y y + [y 》 16 )，y y (y 》 32 )， 
u [Of course, MMIX’s single instruction ‘SADD nu,x,0 , is much better.] 
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44. Let this sum be v^x. If we can solve the problem for 2 d -bit numbers, we can 
solve it for 2 d+1 -bit numbers, because u^ 2 \2 d x-\-x ) = v^x + v^x +2 d ux. Therefore 
a solution analogous to ( 62 ) suggests itself, on a 64-bit machine: 

Set z (x 》 1) & /i。and y ^ x — z. 

Set z t ((z + ( 2 ： > 2)) & /xi) + ((y & fh) > 1) and y (y fii) -\- ((y 》 2) & /ii). 

Set z ^ ((z -\- (z^> 4)) & /x 2 ) + ily & ft 2 ) > 2) and y (y + (y > 4)) & /x 2 . 

Finally " (2) l {{{Az) mod 2 64 ) > 56) + ((((By) mod 2 64 ) >65) <3), 
where A = (11111111)256 and B = (01234567)256. 

But another approach is better on MMIX，which has sideways addition built in: 

SADD mi2 ， x ， m0 SADD t ， x ， m2 8 ADDU 皿 2 ， t ， rm2 SADD t ， x ， m5 

SADD t ， x，ml 4ADDU nu2 ， t ， mi2 SADD t ， x ， m4 SLU t ， t，5 

2ADDU nu2 ,t ,nu2 SADD t ， x ， m3 16ADDU nu2 ， t ，皿 2 ADD 皿 2 ， im2，t | 

In general, v^x = 2 k u{x & Jik)- See Dr. Dobb’s Journal 8 , 4 (April 1983) ， 24-37.] 

45. Let d = {x — y) {y — x)\ test if d y = d. [Rokicki found that this idea can be 
used with node addresses to near - randomize binary search trees or Cartesian trees as 
if they were treaps, without needing an additional random “priority key” in each node. 
See U.S. Patent 6347318 (12 February 2002).] 

46. SADDt ， x ， m; NXOR y ， x ， m; CS0Dx ， t ， y; the mask m is ^(l<<i 11«j) . (In general, 
these instructions complement the bits specified by m if those bits have odd parity.) 

47. y 4 — (x 8) 0. 2 ： 4 — (x 0) x i- (x ^ m) \ y \ where rn = 6 \ (0 8). 

48. Given there are ss = jPi ( n +j) /s\+i different (5-swaps, including the identity 

permutation. (See exercise 4.5.3-32.) Summing over S gives 1+X^6=i ( s 6 — 1) altogether. 

49. (a) The set S = {ai(ii + - - •- \-amdm \ {ai, … ， a m } C {—1 ， 0 ， +1}} for displacements 
Si, • • • ， Sm must contain {n —l，n — 3, • • • ， 1 —n}，because the A:th bit must be exchanged 
with the (n + 1 — 2k)t\v bit for 1 < A: < n. Hence \S\ > n. And S contains at most 3 m 
numbers, at most 2 • 3 m-1 of which are odd. 

(b) Clearly s(mn) < s(m) + s(n), because we can reverse m fields of n bits each. 
Thus s(3 m ) < m and s(2 - 3 m ) < m + 1. Furthermore the reversal of 3 m bits uses 
only swaps with even values of S] the corresponding ((5/2)-swaps prove that we have 
s((3 m 士 1)/2) < m. These upper bounds match the lower bounds of (a) when m > 0. 

(c) The string aa/30 咕 zuj with \a\ = \/3\ = \0\ = \^\ = \co\ = n can be changed to 
ujzi^Opaa with a (3n + l)-swap followed by an (n + l)-swap. Then s(n) further swaps 
reverse all. Hence s(32) < s( 6 ) + 2 = 4， and s(64) < 5. Again, equality holds by (a). 

Incidentally, s(63) = 4 because s(7) = s(9) = 2. The lower bound in (a) turns out 
to be the exact value of s{n) for 1 < n < 22, except that s(16) = 4. 

50* Express n = (t m .. . tito )3 in balanced ternary notation. Let rij = (tm ... tj )3 and 
Sj = 2rij + 心 一 i, so that nj_i — Sj = rij and 2Sj — rij 一 i = rij + tj for 1 < j < m. Let 
Eq = {0} and Ej^i = Ej U (tj — Ej) for 0 < j < m. (Thus，for example, Ei = {0, to} 
and E 2 = { 0 ,t o , ti，ti — t 0 }.) 

Assume by induction on j that d-swaps for d = di， … ，〜have changed the n- 
bit word a 1 ... a 3 j to a 3 j … o^，where each subword has length rij + Sk for some 
£k G Ej. If j < m，a dj+i-swap within each subword will preserve this assumption. If 
j = m，each ak has \ak \ < m + 1, because e G Ej implies |e| < j. Therefore 2-swaps 
for Llg m 」> A: > 0 will reverse them all. (Note that a 2 fe -swap on a subword of size t, 
where 2 k < t < 2 fc+1 ，reduces it to three subwords of sizes t — 2 k ^ 2 k+1 — t — 2 k •) 
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51. (a) If c = (cd-i .. . Co) 2 . we must have Od-i = Cd —— i. But for 0 < k < d — 1 

八八 

we can take Ok = ㊉ 6k^ where Ok is any mask C fid.k- 

(b) Let Q(d. c) be the set of all such mask sequences. Clearly ㊀ （ 1 ， c) = {c}. When 
d > 1 we will have, recursively, 


㊀ (d ， c) = {(0 0 ,… ，〜一 2 , 〜一 1 ，心一 2 , • • • ， 6。） I 0 k = ^fc -1 i 0 k = 私一 1 ：E 敗一 1 } 


by “zipping together” two sequences (Oq^ ..., 沒 ^ 一 3 , 沒 i_ 2 , 沒 i_ 3 , … ，的 ） G G(d — l.c) and 
( 敗，…，^一 3 , ^ 一 2 ， some appropriate Qo, 谷 0 , c’，and c"• 

A 

When c is odd. the bigraph corresponding to ( 75 ) has only one cycle; so (^o, 沒 0 , 
c’ ， c") is either 0. [c/2], L c /2」）or (0,/id ，。， L c /2 」，「 c/2]). But when c is even, the 

bigraph has 2 d_1 double bonds; so Oo = Oo is any mask C /id,o，and c = c 11 = c/2. 
[Incidentally, lg\G(d,c) \ = 2 d ~ 1 (d- 1) - - l)(2 fcZl - l^- 1 — cmod 2 fe |).] 

A A 

In both cases we can therefore let Od -2 = • • • = = 0 and omit the second half 

of ( 71 ) entirely. Of course in case (b) we would do the cyclic shift directly, instead of 
using ( 71 ) at all. But exercise 58 proves that many other useful permutations, such as 

八 

selective reversal followed by cyclic shift, can also be handled by ( 71 ) with 0^=0 for 
all k. The inverses of those permutations can be handled with 0^=0 for 0 < A; < 1. 


52. The following solutions make 6j = 0 whenever possible. We shall express the 
6 masks in terms of the //’s，for example by writing & /i 6,5 instead of stating the 
requested hexadecimal form #55555555; the \i form is shorter and more instructive. 

A 

(a) /ie,fc and Ok = (/ifc+i ㊉ /ifc-i) & for 0 < A: < 5; ^5 = O 4 . (Here 

八 

/i-i = 0. To get the “other” perfect shuffle, (X 31 X 63 ... ^ 1 ^ 33 ^ 0 ^ 32 ) 2 , let Oo = /i6,o&/ii.) 

AAA 

A (b) Oo = 0 3 = 0 o = /i 6 ,0 1^3] 0i = O4 = 0i = /i6,l & M4; 沒 2 = 汐 5 = 沒 2 = /^6,2 & /^5 ； 

§ 3 = 0 4 = 0. [See J. Lenfant. IEEE Trans. C-27 (1978). 637-647, for a general theory.] 

( c ) ^0 ― /^ 6,0 & •，❽ 1 = /^ 6,1 & /^5 5 ^2 ― ^4 — /^ 6,2 & ^3 ― ^5 ― /^ 6,3 & /^5 5 

谷 0 = "6,0 & /X2 ； ^1 = "6，1 & ^3 ； ^2 = ^0 ® ^2 ； A = ㊉ 沒 3 ; 64 = 0. 

(d) Ok = f^6,k & 1 ^ 5 -k for 0 < A: < 5; for 0 < k < 2: 0 3 = 0 4 = 0. 

53. We can write ^ as a product of d — t transpositions ， (ixi^i)... (Ud-tVd - 1 ) (see 
exercise 5.2.2-2). The permutation induced by a single transposition (uv) on the index 
digits，when u < v, corresponds to a (2 V — 2^)-swap with mask [id'v & f^u- We should 
do such a swap for (u\V\) first, • •• ， (ud-iVd-i) last. 

In particular, the perfect shuffle in a 2 d -bit register corresponds to the case where 
^ = (01... ((i — 1)) is a one-cycle; so it can be achieved by doing such (2 V — 2^)-swaps 
for (u^ v) = (0,1). .... (0, d — 1). For example, when d = 3 the two-step procedure is 
12345678 ^ 13245768 ^ 15263748. [Guy Steele suggests an alternative (d — l)-step 
procedure: We can do a 2 fc -swap with mask fid'k+i & Jlk for d— 1 > k > 0. When d = 3 
his method takes 12345678 ^ 12563478 ^ 15263748.] 

The matrix transposition in exercise 52(b) corresponds to d = 6 and (u. v) = (0, 3 )， 
(1,4), (2, 5). These operations are the 7-swap, 14-swap, and 28-swap steps for 8x8 
matrix transposition illustrated in the text; they can be done in any order. 

For exercise 52(c), use d = 6 and (u.v) = (0,2) ， (1 ， 3) ， (0,4) ， (1, 5). Exercise 52(d) 
is as easy as 52(b)，with (u ， v) = (0,5) ， (1,4), (2,3). 

54. Transposition amounts to reversing the bits of the minor diagonals. Successive 
elements of those diagonals are m — 1 apart in the register. Simultaneous reversal of 
all diagonals corresponds to simultaneous reversal of subwords of sizes 1. … ， m，which 
can be done with 2-swaps for 0 < A; < [lg m] (because such transposition is easy 
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when m is a power of 2. as illustrated in the text). Here’s the procedure for m = 7: 


Given 

00 01 02 03 04 05 06 
10 11 12 13 14 15 16 
20 21 22 23 24 25 26 
30 31 32 33 34 35 36 
40 41 42 43 44 45 46 
50 51 52 53 54 55 56 
60 61 62 63 64 65 66 


6-swap 

00 10 02 12 04 14 06 
01 11 03 13 05 15 25 

20 30 22 32 24 16 26 

21 31 23 33 43 35 45 

40 50 42 34 44 36 46 

41 51 61 53 63 55 65 
60 52 62 54 64 56 66 


12-swap 

00 10 20 30 04 14 24 
01 11 21 31 05 15 25 
02 12 22 32 06 16 26 
03 13 23 33 43 53 63 

40 50 60 34 44 54 64 

41 51 61 35 45 55 65 

42 52 62 36 46 56 66 


24-swap 

00 10 20 30 40 50 60 
01 11 21 31 41 51 61 
02 12 22 32 42 52 62 
03 13 23 33 43 53 63 
04 14 24 34 44 54 64 
05 15 25 35 45 55 65 
06 16 26 36 46 56 66 


55 . Given x and y. first set x x \ (x<^2 k ) and y y\(y^2 k ) for 2d < k < 3d. Then 
set x (2 2d+k — 2 fc )-swap of x with mask fi 2 d+k & J^k and y 4 — ( 2 2d+fc — 2 d+fc )-swap of y 
with mask fJ/ 2 d+k&]^d+k for 0 < k < d. Finally set 2 ： x&y, then either z <— z I (z^$>2 k ) 
or 2 : 1 2 : ® ( 2 : 》 2 fc ) for 2d < k < 3d. and z ^ z (2 n2 — 1). [The idea is to form two 

71 X 7T- X 77/ arrays X — (xooo • • - *^(n—l)(n—l)(n—l))n and y = (yooo • • • 2/(n—1) (n —1) (n—1) )n 

with Xijk = cijk and yijk = bjk, then transpose coordinates so that Xijk = dji and 
yijk = bn ，now x^y does all n 3 bitwise multiplications at once. This method is due to 
V. R. Pratt and L. J. Stockmeyer. J. Computer and System Sci. 12 (1976) ， 210-213.] 

56 . Use ( 71 ) with 0 o = 0 o = 0, Oi = # 0010201122113231, 0 2 = # 00080e0400080c06 ， 
0 3 = # 00000092008100a2, 0 4 = # 0000000000000f16, 0 5 = # 0000000003199c26, 0 4 = 
# 00000c9f0000901a, 0 3 = # 003a00b50015002b, 0 2 = # 000103080c0d0f0c ， and §!= 
#0020032033233333. 


57. The two choices for each cycle when ^ > 1 have complementary settings. So we 
can choose a setting in which at least half of the crossbars are inactive, except in the 
middle column. (See exercise 5.3.4-55 for more about permutation networks.) 

58. (a) Every different setting of the crossbars gives a different permutation, because 
there is exactly one path from input line i to output line j for all 0 < i^j < N • (A net¬ 
work with that property is called a “banyan.”）The unique such path carries input i 
on line /(z, k) = ((z 》 A:) 《 念 ） + (j. mod 2 k ) after k swapping steps have been made. 

(b) We have j, k) if and only if i mod 2 k = j mod 2 k and i^p》k = 

jip 》 k; so (*) is necessary. And it is also sufficient，because a mapping 99 that sat¬ 
isfies (*) can always be routed in such a way that j(p appears on line l = jf, k) 

after k steps: If A: > 1 ， j^p will appear on line j, k — 1)，which is one of the inputs 
to l. Condition (*) says that we can route it to l without conflict，even if l is 

[In IEEE Transactions C-24 (1975) ， 1145—1155, Duncan Lawrie proved that condi¬ 
tion (*) is necessary and sufficient for an arbitrary mapping (f of the set {0 ， 1,...,7V—1} 
into itself, when the crossbar modules are allowed to be general 2 x 2 mapping modules 
as in exercise 63. Furthermore the mapping might be only partially specified, with 
j^p = ^ (“wild card” or “don’t-care”）for some values of j. The proof that appears in 
the previous paragraph actually demonstrates Lawrie’s more general theorem.] 

(c) i mod 2 k = j mod 2 fe if and only if A: < p(i 0 jf); i》k = j》k if and only if 
k > A(z ® j); and i^p = j(f if and only if i = j, when (f is a permutation. 

(d) \(i^p ® jV) ^ ㊉ J) for all i and j if and only if X(ir(f ^ jr^p) < p(ir ® jt)= 

p(i ㊉ j) for all i and j，because r is a permutation. [Note that the notation can be 
confusing: Bit jrcj) appears in bit position j if permutation (/) is applied first, then r. 

(e) Given i ^ j we must prove that X{iip 寸 ㊉ jVVO 2 pi} 0 i)- Case 1, i and j are 

fixed by both 99 and Then ® = A(z 0 j) > p(i ® j). Case 2, i 屮手 i and 

ji/j = j: Then X(iip 咕 ㊉ = 入 (_ ㊉ jV) 2 p(^ ㊉ J). Case Z 、 ♦ i and ^ j: Then 

入 (ip# ㊉ ® j 寸） • Let k = p{i ® j), and suppose A(ip ㊉ < &• Then 
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i mod 2 k = j mod 2 k and i^p 》 k = 钟 》 k. Hence k) = l(j 也 j, 念 ）， and that line 

carries both i(f and j 水 But those two values cannot be equal, 

59. It is 2 Md ( a ， 6 )，where Md(a ，6 ) is the number of crossbars that have both endpoints 
in [a . .b]. To count them, let k = A(a ㊉ 6 )，a = a mod 2 k , and b’ = b mod 2 fc ; notice that 
b— a = 2 fc +6’ — a’，and Md(a ， b) = M^+i (a’ ， 2 k +6 7 ). Counting the crossbars in the top 
half and bottom half, plus those that jump between halves，gives Mfc+i(a’ ，2 fc + b’）= 
Mk (a’• 2 fc — 1) + Mfc(0. b 1 ) + ((6 7 + 1) — a 7 ). Finally, we have Mk (0, 6 7 ) = *S(6 / + 1); and 
M k \a',2 k - 1) = M k (0, 2 k -l-a') = S(2 k - a') = k2 k ~ 1 - ka' -j-S(a'), where S(n) is 
evaluated in exercise 42. 

60. A cycle of length 21 corresponds to a pattern uq ^ vq ^ V\ ^ U\ ^ U 2 V 2 ^ 

•••*<-> V 21 — i U 21-1 U 21 . where U 21 = uo and c u 4 — or c v —> means that the 

permutation sends u to v. c x means that = y ㊉ 1. 

We can generate a random permutation as follows: Given uq, there are 2n choices 
for vo : then 2n — 1 choices for u\ only one of which causes U 2 = 从 0 , then 2n — 2 choices 
for V 2 , then 2n — 3 choices for U 3 only one of which closes a cycle, etc. 

Consequently the generating function is G(z) = ]^[J =1 2 n- 2 j+i • The expected 
number of cycles, k, is G’(l) = i^ 2 n — \H n = | Inn + In 2 — !7 + 0(n _1 ). The mean 

of 2 k is G(2) = (2 n n!) 2 /(2n)! = yjixn + 0(n _1//2 ); and the variance is G(4) — G(2 ) 2 = 
(n + 1 - G(2))G(2) = y^n 3 / 2 + 0(n). 

62. The crossbar settings in P(2 d ) can be stored in (2d—l)2 d ^ 1 = Nd— bits. To get 
the inverse permutation proceed from right to left. [See P. Heckel and R. Schroeppel, 
Electronic Design 28, 8 (12 April 1980) ， 148-152. Note that any way to represent an 
arbitrary permutation requires at least lgTV! > Nd — N/In 2 bits of memory; so this 
representation is nearly optimum ， spacewise.] 

63. (i) x = y. (ii) 2 ： must be even. (When 2 ： is odd we have [x\y)^> z = (y 》 \z/2 \) J 
{x 》 Lz/2」），even when 2 : < 0.) (iii) This identity holds for all w ， x, y, and 2 : (and also 
with any other binary bitwise Boolean operator in place of &). 

64. (((z I /i 0 ) + {z 1 & /2o)) & flo) I {((z I /2 0 ) + {z' & /i。））& "o). (See (86).) 

65. xu(x 2 ) + v(x 2 ) = xu(x ) 2 + v(x) 2 . 

66 . (a) v{x) = (u(x)/ (l+x°)) mod x n ; it，s the unique polynomial of degree less than n 
such that (1 -\-x^)v(x) = u(x) (modulo x n ). (Equivalently, v is the unique n-bit integer 
such that (i? ㊉ (w 《 d)) mod 2 n = u.) 

(b) We may as well assume that n = 64m，and that u = (u m -i ••• uiUo) 2 6 4 , 

v = … Set c 0; then, using exercise 36， set Vj uf 0 (—c) and 

j 

c ^ Vj^> 63 for = 0 ， 1, …， m — 1. 

(c) Set c vq uo ； then Vj i— Uj ㊉ c and c l Vj, for j = 1 ， 2， • • • , m — 1. 

(d) Start with c 0 and do the following for j = 0, 1. ..., m — 1: Set t Uj, 

t i f ㊉ （ t 《 3)〆 i f ㊉ （ f 《 6 ), f f ㊉ （ f 《 12), t t ㊉ （ t 《 24), t i- t ㊉ （ t 《 48), 

Vj d {t 》 61) X #9249249249249249. 

(e) Start with v i— u. Then, for j. = 1 ， 2, …， m — 1， set Vj Vj 0 (Wj-i 《 3) and 
(if j < m — 1 ) Vj +1 ^ %+i ㊉ (vj-i > 61). 

67* Let n = 21 — 1 and m = n — 2d. If |n < k < n we have 三 + x l (modulo 

x n +x m + 1)，where t = 2k —n is odd. Consequently, if v = (v n ^i ... ^ 1 ^ 0 ) 2 , the number 

切 =w ㊉ (((u 》 cQ ㊉ （ m 》 2 d) ㊉ (n 》 3cT) ㊉ ...）& — 2 l ~ d ) 

turns out to equal {v n - 2 • •. v^v\v n ^i ... ^ 2 ^ 0 ) 2 . For example, when / = 4 and d = 2, 
the square of uqx & + ••- + u\x + 从 0 modulo (x 7 +x 3 + 1) is uex 5 -\-u 5 x 3 + (uq + 
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(us 0 U 3 )x 6 + ( 从 6 0 0 U 2 )x 4 + U\X 2 + To compute v, we therefore do a perfect 

shuffle, v = [w/2 l \ J (w mod 2 l ). The number w can be calculated by methods like 
those of the previous exercise. [See R. P. Brent, S. Larvala, and P. Zimmermann, 
Math. Comp. 72 (2003), 1443-1452; 74 (2005), 1001-1002.] 

68. SRU t ,x,delta; PUT rM,theta; MUX x,t ,x. 

69. Notice that the procedure might fail if we attempt to do the 2 d_1 -shift first instead 
of last. The key to proving that a small-shift-first strategy works correctly is to watch 
the spaces between selected bits; we will prove that the lengths of these spaces are 
multiples of 2 k+1 after the 2 fc -shift. 

Consider the infinite string Xk = … 1“ 0 2； " l ts 0 2； " l t2 O 2 、 l! 1 0 2k l to , which represents 
the situation where ti > 0 items need to move 2 k l places to the right. A 2-shift with 

mask 0 k = ... 0 t4 0 2； " l t3 l 2k 0 t2 0 2k l tl \ 2k \ to leaves us with the situation represented by 

the string Xk+i = • • • 1 T2 0 2 0 2 l Tl 0 2 0 2 1 T °, where exactly T\ = t 2 i + 亡 2 Z +1 items need 
to move 2 fc+1 / places. So the claim holds by induction on k. 

70. In the previous answer, notice that the string 如 =Qk ® 《 1) has a 1 at the 

right of each subword O 2 ^ of %&， and zeros elsewhere; hence 6k = in the notation of 

— i 

exercise 36. Furthermore ^k+i — (^k & 0k ) 》 2 • Therefore we can proceed as follows: 

Set 0 又， A; 0， and repeat the following steps while ^ ^ 0: Set x t 也 then 

x a; ㊉ （ x 》 2 Z ) for 0 < l < d. then 6k 4 — ^ (4 & 无）》 2' and k ^r- k 1. 

If k < d at the end of this computation，the remaining masks 0k : • • • ，沒 d-i are 
zero and those steps can be omitted from (8o). Sometimes this procedure gives nonzero 
masks 6k that actually do nothing useful, because = • • • = 0. To avoid such 

redundancy, change c 0k to c 6k + & ((# 》 2 fc ) | 2 d ~ k )))\ 

[See compress in H. S. Warren, Jr., Hacker’s Delight (Addison-Wesley. 2002) ， §7—4; 
also G. L. Steele Jr., U.S. Patent 6715066 (30 March 2004).] 

71. Start with x i— y. Do a (—2 fc )-shift of x with mask 6k^ for k = d— 1. ..., 1 ， 0, using 

the masks of exercise 70. Finally set z x (or 2 ： if you want a “clean” result). 

0 d-i 

72. 2 2 x + y. 

73 • Equivalently, d sheep-and-goats operations must be able to transform the word 

—i)tt •… ^i7r^07r)2 into (x 2 d_ 1 • • • xiXo) 2 ，for any permutation tt of {0, 1， • • • ， 
2 d — 1}. And this can be done by radix-2 sorting (Algorithm 5.2.5R): First bring the 
odd numbered bits to the left, then bring the bits j for odd |j/2 」 left，and so on. 
For example, when d = 3 and x n = (X 3 X 1 X 0 尤 7 尤 5 X 2 尤 6 尤 4 ) 2 ，the three operations yield 

successively (^3^1^7^5^0^2^6^4)2,( 尤 3 尤 7 尤 2 尤 6 尤 1 尤 5 尤 0 尤 4)2,( 尤 7 尤 6 尤 5 尤 4 尤 3 尤 2 尤 1 尤 0)2 • [See 

Z. Shi and R. Lee, Proc. IEEE Conf. ASAP’00 (IEEE CS Press, 2000) ， 138-148.] 

Historical note: The BESM-6 computer, designed in 1965. implemented half of 
the sheep-and-goats operation: Its «c6opKa» (“gather” or “pack”）command produced 
(2：& %) X, and its «pa36opKa» command (“scatter” or “unpack”) went the other way. 

74 . If c 2 i — Y2 c 2 /+i I = 2A > 0, we must rob A from the rich half and give it to 
the poor. There’s a position l in the poor half with c\ = 0; otherwise that half would 
sum to at least 2 d 一 1 . A cyclic 1 - shift that modifies positions l through (/ + t) mod 2 d 
makes = c/+fc+i for 0 < A; < t ， C / 十亡 = c/+t+i — S. Cz+j+i = $， and 

for all other k: here S can be any desired value in the range 0 < S < c/+ t +i. (We’ve 
treated all subscripts modulo 2 d in these formulas.) So we can use the smallest even t 

such that ci +1 + c/ +3 H - + ci +t +i = c/ + ci +2 H - h ci +t + △ + 6 for some S >0. 

(The 1-shift need not be cyclic，if we allow ourselves to shift left instead of right. 
But the cyclic property may be needed in subsequent steps.) 
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75. Equivalently, given indices 0 < io < < • • • < is — i < = 2 d and 0 = jo < recursively 

ji < … < js — i<js= 2 d ，we want to map (x 2 d_ 1 .. . xiXo ) 2 ^ (尤 （ 2 d — i )# • • • 尤 1 <^ 0 #) 2 ， 

where j^p = i r for j r < j < j r -\-i and 0 < r < s. If d = 1, a mapping module does this. Oftnan 

When d > 1， we can set the lefthand crossbars so that they route input i r to line magic masks 

i r 0 ((i r +r) mod 2). If s is even, we recursively ask one of the networks P(2 d_1 ) inside 
P(2 d ) to solve the problem for indices [{io^ 2 ^ - . - ， t}/2」and L{jo, ） 2 , … ， js}/2」，while Stockmeyer 

the other solves it for 「 {< 1 ， < 3 , • • • ， “-i ， 2 d }/2] and 「 {jo, J 2 , • • • ， Js}/2]• At the right of Arndt 

P(2 d ), one can now check that when j r < j < Jr+i ， the mapping module for lines j MMli 

and j ㊉ 1 has input i r on line j if j •三 r (modulo 2). otherwise z r is on line j •㊉ 1. 2ADDU 


A similar proof works when s is odd. 

Notes: This network is a slight improvement over a construction by Yu. P. Ofman ， 
Trudy Mosk. Mat. Obshchestva 14 (1965) ， 186—199. We can implement the correspond¬ 
ing network by substituting a “5-map” for a d-swap; instead of ( 69 ), we use two masks 
and do seven operations instead of six: y 4 — (x^>S)^ x 4 — x 0 

This extension of ( 71 ) therefore takes only d additional units of time. 

76. When a mapping network realizes a permutation, all of its modules must act as 
crossbars; hence G(n) > lgn!. Ofman proved that G(n) < 2.5nlgn，and remarked in 
a footnote that the constant 2.5 could be improved (without giving any details). We 
have seen that in fact G(n) < 2nlgn. Note that G(3) = 3. 

77 . Represent an n-network by (x2^-i . . • 尤 1 尤 0 ) 2 ， where Xk = [the binary representa¬ 
tion of A: is a possible configuration of Os and Is when the network has been applied to 
all 2 n sequences of Os and ls]，for 0 < A: < 2 n . Thus the empty network is represented 
by 2 2?1 — 1， and a sorting network for n = 3 is represented by ( 10001011 ) 2 . In general, 

x represents a sorting network for n elements if and only if it represents an n-network 
and ux = n -\- if and only if x = 2 ° + 2 1 + 2 3 + 2 7 + • • • + 2 2?1 一 1 . 

If x represents a according to these conventions, the representation of a[z:j] is 
(x © y) I (y 》 ( 2 n_? - where y = x fl n -j k 

[See V. R. Pratt, M. O. Rabin, and L. J. Stockmeyer, STOC 6 (1974) ， 122—126. 

78. If A: > lg(m — 1) the test is valid，because we always have X\ + ^2 + • • • + > 

xi \ X 2 \ • • • \ Xm, with equality if and only if the sets are disjoint. Moreover, we have 

(xi + . … + Xm) — (^1 I … .I Xm) ^ — l)(2 n k 1 + . … + 1) < (m - l)2 n k < 2 n . 

Conversely, if m > 2 fc + 2 and n > 2k, the test is invalid. We might have, for 
example, xi + ---+x m _ = ( 2 fc + 1 )( 2 一 _ 。打 - 说 - 1 ) + 2 一 - 1 = 2 n + (2 n ~ k - 2 n ~ 2k ~ 1 ). 

But \i n <2k the test is still valid when m = 2 fc + 2, because our proof shows that 
尤 1 + • • • + — ( 尤 1 I • • • I x m) < ( 2 fc + l)( 2 n —— 1 ) < 2 n in that case. 

79. x, = (x — 1) Sz x- (And the formula x, = ((x — 6 — 1) & a) + 6 corresponds to ( 85 ).) 
These recipes for x 1 and x, are part of Jorg Arndt’s “bit wizardry” routines (2001); 
Arndt credits ( 84 ) to Glenn C. Rhoads. 

80. Perhaps the nicest way is to start with x x — 1 as a signed number; then while 
x > 0, set x 4 — x & %， visit x，and set x i— 2x — x- (The operation 2x — x can in fact 
be performed with a single MMIX instruction, ‘2ADDU x ， x ， mimischi ’.） 

But that trick fails if % is so large as to be already “negative.” A slightly slower 
but more general method starts with x i— x and does the following while x ^ 0: Set 
t ^ x Sz —x，visit 乂 一 t, and set x <— x — t. 

81. ((z & x) — (z & %)) & (One way to verify this formula is to use ( 18 ).) 

82. Yes，by letting z = z f in ( 86 ): w \ (z ^ x)^ where w = ((( 2 : & X) 《 1) + 又 ） & X. 
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83. (The following iteration propagates bits of y to the right, in the gaps of a scattered 
accumulator t. Auxiliary variables u and v respectively mark the left and right of each 
gap; they double in size until being wiped out by w.) Set t 2 ： & %, 4 — (X 》 1) & 叉， 
v ((x "C 1) + 1) & X, ^ 3(u &: v\ u 3u^ v i— 3v^ and k 1. Then，while 

^ 0 , do the following steps: t ^ t \ ((t^>k)Szu)^ k k ^1^ ui— 
w ^r- ((v & ( 从》 1 ) & ii) 《 （ A: + 1 )) — ((w & (i; 《 1 ) & 6 ) 》 A :)，u u ((u & 6 ) 》 A :)， 
v v {(v & ii) 《 A:). Finally return the answer ((t 》 1) & %) | ( 2 : & %). 

84. z x— x — w ~ { z ^ x )： where w = ((( 2 : & X) 《 1) + 又 ) & X appears in answer 82; 
z —r x %ls the quantity t computed (with more difficulty) in the answer to exercise 83. 

85. (a) If x = L0C(a[i, j. k]) is the drum location corresponding to interleaved bits as 
stated, then LOC (a[i + 1 ， j ， A:]) = x ㊉ ((t ㊉ ((x & %) — X)) & X) an d LOC (a[i — 1 ， j, k])= 
x ㊉ ((x ㊉ ((x & x) — 1)) & x), where x = (11111)8, by ( 84 ) and answer 79. The formulas 
for LOC(a[z, j =t l.k]) and LQC(a[z. j, k =t 1]) are similar, with masks 2\ and 4%. 

(b) For random access, let’s hope there is room for a table of length 32 giving 

/ [( 24 ^ 3 ^ 2 ^ 1 ^ 0 ) 2 ] = {iAW 2 iiio)s- Then L0C(a[z, j, A:]) = (((/[A:] 《 1) + f[j]) 《 1) + f[i]. 

(On a vintage machine, bitwise computation of f would be much worse than table 
lookup, because register operations used to be as slow as fetches from memory.) 

(c) Let p be the location of the page currently in fast memory, and let 2 : = —128. 
When accessing location x, if x & 2 : 7 ^ p it is necessary to read 128 words from drum 
location x z (after saving the current data to drum location p if it has changed); 
then set p x z. [See J. Royal Stat. Soc. B-16 (1954) ， 53—55. This scheme of array 
allocation for external storage was devised independently by E. W. Dijkstra, circa 1960, 
who called it the “zip-fastener” method. It has often been rediscovered, for example 
in 1966 by G. M. Morton and later by developers of quadtrees; see Hanan Samet, 
Applications of Spatial Data Structures (Addison—Wesley, 1990). Georg Cantor had 
considered interleaving the digits of decimal fractions in Crelle 84 (1878) ， 242—258, §7; 
but he observed that this idea does not lead to an easy one-to-one correspondence 
between the unit interval [0.. 1] and the unit square [0.. 1] X [0.. 1]. 

86 . If (p . q ,r) bits of [i• j ， k) are in the part of the address that does not affect the 
page number, the total number of page faults is 2 (( 2 p-p — l)2 q+r + {2 q ^ q， — l) 2 p+r + 

(2 r — r, — 1)2 P+9 ). Hence we want to minimize 2— p +2— 9 +2— r over the set 0 < p’ < p ， 
0 < q < 0 < r < p -\- q ^ r = s. Since 2 a + 2 6 > 2 a_1 + 2 b+1 when a and b are 

integers with a > 6 + 1， the minimum (for all s) occurs when we select bits from right to 
left cyclically until running out. For example, when (p, q, r) = (2, 6, 3) the addressing 
function would be (J 5 J 4 J 3 念 2 J 2 众 iji<i 念 oJo<o) 2 . In particular, Tocher’s scheme is optimal. 

[But such a mapping is not necessarily best when the page size isn’t a power of 2. 
For example, consider a 16 X 16 matrix; the addressing function [ 331 ^ 12 ^ 1 ^ 0323130)2 is 
better than (j 3 ^ 3 i 2 ^ 2 ji^iJo^o )2 for all page sizes from 17 to 62, except for size 32 when 
they are equally good.] 

87. Set x 〜 ((x & ) 》 1 ); each byte (07 • • • ao )2 is thereby changed to 

(« 5 八 知）《 4 … ao) 2 - The same transformation works also on 30 additional letters 
in the Latin-1 supplement to ASCII (for example, ae ^ iE); but there’s one glitch, y ^ 13. 

[Don Woods used this trick in his original program for the game of Adventure 
(1976)，uppercasing the user’s input words before looking them up in a dictionary.] 

88 . Set 2 : (x ® 及 ） & / 1 ， then 2 : 4- ((x \h) — (y k, h)) ® 2 :. 

89. x [x k,{x\y)) \ (xk,y)^ z (x&y&/io) | (t^fio). [From 

the “nasty” test program for H. G. Dietz and R. J. Fisher’s SWARC compiler (1998).] 
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90 * Insert c z <— z j ((x Q y) &: iy either before or after ‘ 2 ： (x y) + z’• (The ordering 
makes no difference, because x+y 三 x ㊉ y (modulo 4) when x+y is even. Therefore MMIX 
can round to odd at no additional cost，using MOR. Rounding to even in the ambiguous 
cases is more difficult, and with fixed point arithmetic it is not advantageous.) 

91 . If ^ [x^ y] denotes the average as in (88). the desired result is obtained by repeating 
the following operations seven times，then concluding with 2 ： 4 — ^ [ x ^ y] once more: 

^ ^y], t •<— a & /i, m •<— (f 《 1) — (t 》 7), 

x < — (m & z) I (m & x), y (m & z) \ (m & y)， a a 1. 


Although rounding errors accumulate through eight levels, the resulting absolute error 
never exceeds 807/255. Moreover, it is ^ 1.13 if we average over all 256 3 cases, and 
it is less than 2 with probability ^ 94.2%. If we round to odd as in exercise 90， the 
maximum and average error are reduced to 616/255 and ^ 0.58; the probability of error 
< 2 rises to ^ 99.9%. Therefore the following MMIX code uses such unbiased rounding: 


x GREG ;y GREG ;z GREG 

alf GREG ;m GREG ;t IS $255 

ffhi GREG -1«56 repeat seven times: < 

1 GREG #0101010101010101 


X0R t ， x，y 
MOR z ， rodd，t 
AND t ， x ， y 
ADDU z ， z，t 


rodd GREG #4020100804020101 


MOR m，ffhi ， alf 
PUT rM，m 
MUX x，z，x > 

MUX y,y,z 
SLU alf,alf ,1 


after which the first four instructions are repeated again. The total time for eight 


a-blends (67r) is less than the cost of eight multiplications. 


92. We get Zj = \{xj + yj)/2\ for each j. (This fact, noticed by H. S. Warren, Jr., 
follows from the identity x + y = ((x j y) 《 1) — (x ® y). See also the next exercise.) 


93 . x — y = (x ® y )— ((元 & y) 《 1). (“Borrows” instead of “carries.”) 


94 . (x — l)j = (xj — 1 — bj) mod 256, where bj is the “borrow” from fields to the right. 
So tj is nonzero if and only if (xj ... ^ 0)256 < (1 … 1)256 = (256 J+1 — 1)/255. (The 
answers to the stated questions are therefore “yes” and “no .”） 

In general if the constant l is allowed to have any value (Z 7 … ZiZo) 256 ， opera¬ 
tion ( 90 ) makes tj # 0 if and only if (xj ... xo )256 < (Ij … Zo )256 and Xj < 128. 

95 . Use ( 90 ): Test if /i & (t(x ㊉ （(x 》 8 ) + (x 《 56))) | ㊉ （(x 》 16) + (x 《 48))) | 

t{x ㊉ （ (x 》 24) + (x 《 40))) I ㊉ （ (x 》 32) + (x 《 32)))) = 0, where t(x) = (x — l) hx. 
(These 28 steps reduce to 20 if cyclic shift or MOR is available, or 15 with BDIF and MOR.) 


96 . Suppose 0 < x^y < 256, Xh = [x/128\^ xi = x mod 128, yn = [y/128\^ yi = 
y mod 128. Then [x <y] = (xhyh[xi < yi])\ see exercise 7.1.1-106. And [xi <yi]= 
[yi + 127 — xi> 128]. Hence [x <y] = [{xyz)/128 \, where z = (x 127) + (y & 127). 

It follows that t = h&i (xyz) has the desired properties, when 2 ： = (x&ih) (y&ih). 
This formula can also be written t = h Sz ^(xyz) : where 5 = 〜 ((元 & /i) + (y & h ))= 
(x I /i) - (y & h) by ( 18 ). 

To get a similar test function for [xj < yj] = l—[yj < Xj], we just interchange x y 
and take the complement: t h 〜 {xyz、= h Sz (xyz)^ where 2 ： = (x & /i) + (y & h). 

97 . Set x $ © n ******** n ， y’ l z ©y，t (x I ((x | h) - l)) ^ (y ; | ((y ; | 

ml (t<l)-(t>7), t 4 - t^(x\((x\h)-l)), z <- (m& n ******** n )|( 沉 (20 steps.) 

98 . Set u 4 — 2 ： (x&^h)-\-(y^h)^ t 4- ㊉; r ㊉ (ii| (x ㊉ 2 ：))，r 4 — — 

z ^ x ^ v, w ^ y Q v. [This 14-step procedure invokes answer 96 to compute t = 
h & (xyz)^ using the footprint method of Section 7.1.2 to evaluate the median in only 
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three steps when x ^ y is known. Of course the MMIX solution is much quicker, if 
available: BDIF t，x，y; AD DU z，y，t; SUBU w，x，t.] 

99. In this potpourri, each of the eight bytes appears to be solving a different kind 

of problem; we must recast the conditions so that they fit into a common framework: 
fo = 卜 0 ㊉ ’ ！ ’ S 0]，/l = [ 尤 1 ㊉ ’*’ > o], /2 = [x 2 < - 1], /3 = [x 3 > ], / 4 = 

[x 4 > ; a ; — 1]，/ 5 = [ 郎 ㊉ ’0’ < 9], /6 = 卜 6㊉ 255 > 86]，/ 7 = [ 阶 ㊉ ’？’ < 3]. Aha! We 
can use the formulas in answer 96， adjusting d to switch between < and > as needed: 
a = ( ，？，（ 255) ， 0,00 ，*， ， ！，）256 = # 3fff300000002a21; b = h = # 7f7f7f7f7f7f7f7f ; 
c = 〜 (3(86)9( ， a，- l) ， z ，（， A，- 1)00)256 = # 7c29761f 053f 7f 7f (the hardest one); 

d = #8000800000800080; and e = h = #8080808080808080. 

100. We want Uj = Xj-\-yj +cj — 1 Ocj + 1 and Vj = Xj-yj — bj-^lObj^i , where Cj and bj are 
the “carry” and “borrow” into digit position j. Set u < — (x + y + (6 • • • 66 ) 16 ) mod 2 64 
and v -k— (x — y) mod 2 64 . Then we find Uj = Xj + yj + Cj + 6 — 16cj+i and v f j = 
Xj — yj — bj + 166j+i for 0 < j < 16， by induction on j. Hence u and v have the 
same pattern of carries and borrows as if we were working in radix 10 ， and we have 
u = u’ 一 6(ci6 ... C 2 Ci)i 6 , v = v — 6 ( 616 …& 2 &i)i 6 - The following computation schemes 
therefore provide the desired results (10 operations for addition, 9 for subtraction): 

y + (6 • • • 66)16 ， u’ i- x + y’ ， v i- x - y, 

t 4 — (xyu) & (8 … 88 ) 16 ， t {^yv) & (8 … 88 ) 16 ， 

u u — t (t 2 )\ v v 1 — t -\- (t 2 ). 

101. For subtraction, set z x — y; for addition, set 2 ： 4 — x -\-y # e8c4c4f cl 8 , where 
this constant is built from 256 — 24 = # e 8 . 256 — 60 = # c4，and 65536 — 1000 = 
# f cl 8 . Borrows and carries will occur between fields as if mixed-radix subtraction or 
addition were being performed. The remaining task is to correct for cases in which 
borrows occurred or carries did not; we can do this easily by inspecting individual 
digits, because the radices are less than half of the field sizes: Set t 2 ：& #8080808000, 
t (t 《 1) — (t 》 7) — ((t 》 15) & 1)， 2 ： t 2 ： — (t & # e8c4c4f cl 8 ). [See Stephen Soule, 
CACM 6 (1975)，344—346. We’re lucky that the ‘c’ in 4 f cl 8 ，is even.] 

102. (a) We assume that x = ... xo)i6 and y = ( 讲 5 • • ."o)i6，with 0 < yj < 5; 

the goal is to compute u = (从 15 … 从 o)i 6 and v = (W 15 … wo)i6，with components 
Uj = (xj + yj) mod 5 and Vj = (xj — yj) mod 5. Here’s how: 


u ^ x + v ^ x — y 51. 

t + 3/) & /i, t <— (i? + 3/) & /i, 

u i— u — ((t — (t 3)) & 5/); v v — ((t — (t 3)) & 5/). 


Here l = (1... l) i6 = (2 64 — 1)/15. h = 81. (Addition in 7 operations, subtraction in 8.) 
(b) Now x = (X 20 • • • xo)8, etc., and we must be more careful to confine carries: 


t ^ x 

z •<— (t Sz h) [y Sz Ti) , 
t ^r- (y \ z) t h. 
u^x-\-y-(t-\-(t^> 2 )); 


z ^ (x I h) - (y^h), 
t {y \ z) k, x k, h, 
v x — y -{-t + 


Here h = (4.. .4)8 = (2 65 — 4)/7. (Addition in 11 operations, subtraction in 10.) 

Similar procedures work，of course, for other moduli. In fact we can do multibyte 
arithmetic on the coordinates of toruses in general, with different moduli in each 
component (see 7.2.1.3—(66)). 
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103 . Let h and l be the constants in ( 87 ) and ( 88 ). Addition is easy: u ^ x \ ((x&/i)+y). 
For subtraction, take away 1 and add the complement: t (x&Z)》l ，v ^ t \ (t+(y ㊉ l)). 

104 . Yes, in 19: Let a = (((1901 《 4) + 1) 《 5) + 1 ， 6 = (((2099《4) + 12)《5) + 28. 

Set m (x 》 5) & # f (the month), c # 10 & 〜 ((x | 》 1)) 》 5) (the leap year 

correction), u 4 — b + # 3 & (( # 3bbeecc + c) 》 (m + m)) (the max_day adjustment), and 
t ((x 0 a 0 (x — a)) I (x 0 0 — x))) & # 1000220 (the test for unwanted carries). 

105 . Exercise 98 explains how to compute bytewise min and max; a simple modification 
will compute min in some byte positions and max in others. Thus we can “sort by 
perfect shuffles” as in Section 5.3.4. Fig. 57， if we can permute bytes between x and y 
appropriately. And such permutation is easy, by exercise 1 . [Of course there are much 
simpler and faster ways to sort 16 bytes. But see S. Albers and T. Hagerup. Inf. and 
Computation 136 (1997)，25—51，for asymptotic implications of this approach.] 

106 . The n bits are regarded as g fields of g bits each. First the nonzero fields are 
detected (ti)，and we form a word y that has (y g 一 i • •. yo )2 in each g-bit field, where 
yj = [field j of x is nonzero]. Then we compare each field with the constants 2〃 -1 ， 
..., 2 ° (亡 2 )， and form a mask m that identifies the most significant nonzero field of x. 
After putting g copies of that field into 2 ：， we test 2 ： as we tested y {ts). Finally an appro¬ 
priate sideways addition of and tz (^-bit-wise) yields A. (Try the case g = 4：. n = 16.) 

To compute 2 A without shifting left，replace c t 2 《 1’ by H 2 + 亡 2 ’， and replace the 
final line by w (((a • (亡 3 ㊉（亡 3 》 g))) mod 2 n ) 》 （n — g)) • l then w is 2 Xx . 


107. 


h 

GREG 

#8000800080008000 


CSNZ 

X ， 

q，z 


SUBU 

t ,q,t 

ms 

GREG 

#00ff0fOf33335555 


CSNZ 

lam ， q，t 


OR 

t ,t ,y 

1H 

ANDN 

q,x,m5 

2H 

SLU 

q 

x，16 


AND 

t ,t ,h 


SRU 

z ， x，32 


ADDU 

X 

x，q 

5H 

SLU 

q,t,15 


CSNZ 

x ， q，z 


SLU 

q 

x，32 


ADDU 

■t ,t ,q 


ZSNZ 

lam,q,32 


ADDU 

X 

x，q 


SLU 

q,t,30 


ANDN 

q ， x ， m4 

3H 

ANDN 

y 

x,ms 


ADDU 

t ,t，q 


SRU 

z ， x ， 16 

4H 

XOR 

t 

x，y 

6 H 

SRU 

q,t，60 


ADD 

t,1am,16 


OR 

q 

y，h 


ADDU 

lam,la 


The total time 25v (and no mems) should be increased by v for a fair comparison with 
( 56 ), because ( 56 ) doesn’t clobber x. 

• • 2 e # 

108* For example, let e be minimum so that n < 2 e • 2 . If n is a multiple of 2 e . we 

can use 2 e fields of size n/2 e , with e reductions in step Bl: otherwise we can use 2 e 
fields of size 2「 lgn ^ with e + 1 reductions in step Bl. In either case there are e 
iterations in steps B2 and B5, so the total running time is 0(e) = O(loglogn). 

109* Start with x x & —x and apply Algorithm B. (Step B4 of that algorithm can 
be slightly simplified in this special case, using a constant l instead of x ® y.) 

110. Let s = 2 d where d = 2 e — e. We will use s-bit fields in n-bit words. 

Kl. [Stretch x mod s.] Set y x (s — 1). Then set t ^ y Sz ft j and y y ㊉ t ㊉ 

(s —1)) for e > j > 0. Finally set y (y<^s) — y. [If x = (x 2 g ^i ... 尤 0)2 
we now have y = (y 2 e -i … yo) 2 s ， where yj = ( 2 s — l)xj[j < d].] 

K2. [Set up minterms.] Set y y ㊉ (a 2 e — 1 … ao) 2 s ， where aj = fj^dj for 0 < j < d 

and aj = 2 s — 1 for d < j < 2 e . 

K3. [Compress.] Set y t [y 》 2 J s) for e > j > 0, then y y Sz (2 s — 1). [Now 
y = 1 《 （x mod s). This is the key point that makes the algorithm work.] 
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K4. [Finish.] Set y ^ y \ (y 《 2 J s) for 0 < j < e. Finally set y ^ y ^ (/i 2 e ,j ® 
—((；r 》 j) & 1 )) for d y < 2 e . I 

111. The n bits are divided into fields of s bits each, although the leftmost field might 
be shorter. First y is set to flag the all-1 fields. Then t = (. .. tito) 2 s contains candidate 
bits for q, including “false drops” for certain patterns 01with s < k < r. We always 
have utj < 1, and tj ^ 0 implies tj-i = 0. The bits of u and v subdivide t into two 
parts so that we can safely compute m = (t ^ 1 ) | (t 2 ) | • • • | (t^>r)^ before making 
a final test to eliminate the false drops. 

112. Notice that if g = x & (x 《 1) & • • • & 《 (r — 1)) & 〜 (x 《 r) then we have 

If we can solve the stated problem in 0(1) steps, we can also extract the most 
significant bit of an r-bit number in 0(1) steps: Apply the case n = 2r to the number 
2 2?1 — 1 — x. Conversely, a solution to the extraction problem can be shown to yield a so¬ 
lution to the l r 0 problem. Exercise 110 therefore implies a solution in O(log logr) steps. 


113. Let 0 7 = 0, Xq = xo^ and construct x [/ = for 1 < z < r as follows : If 
Xi = a Oj b and ^ {+，一 ， 《}，let i = (i — 1 )’ + 1 and x\, — a b\ where a = x-, 
if a = Xj and a = a if a = Ci. If 而 =a 《 c，let i = (i — 1)’ + 2 and (x 1 ^= 
(a’&(L2 n_c 」一1), :^一丄 《c). If Xi = a+ 6 , let % — (z — 1) / +6 and let {x l ^ i _ 1 y +1 ,... 
compute ((a, & /i) + (V & 八 )) ㊉ ((a, ㊉ 6 ,) & /i)，where h = 2 n-1 . And \i Xi = a — b, do 
the similar computation ((a 7 | h) — (b f & h)) ® ((a 7 = b 1 ) & h). Clearly r < 6 r. 

114. Simply let Xi = Xj^ X k (i) when Xi = Xj^ Xi = Ci ⑷ when 

Xi = Ci Oi Xfc ⑷， and Xi = Xj^ Ci when Xi = Xj ⑷ Oi c<, where Ci = Ci when Ci is a 
shift amount，otherwise Ci = [Ci … c《） 2 n = (2 mn — l)ci / (2 n — 1). This construction is 
possible thanks to the fact that variable-length shifts are prohibited. 

[Notice that if m = 2 d , we can use this idea to simulate 2 d instances of f(x,yi) •’ 
then O(d) further operations allow “quantification.”] 


115. (a) 2 : & ( 无《 1) & (x 《 2 )，y x {x -\- z). [This problem was posed to the 

author by Vaughan Pratt in 1977.] 

(b) First find xi (x 《 1) 元 and x r 4 — x & (x <C 1 ), the left and right ends 

of x，s blocks; and set x’ r = x r ^ {x r — 1). Then 2 ： e <— x f r (x f r — (xi & po)) and 
z 0 ^ x f r ^ (x f r — (xi & /io)) are the right ends that are followed by a left end in even or 
odd position，respectively. The answer is y x & (x + (z e & po) + & /io)); it can be 

simplified to y t & (x + ( 2 ： e ㊉ & /io))). 

(c) This case is impossible，by Corollary I. 


116. The language L is well defined, by Lemma A (except that the presence or absence 
of the empty string is irrelevant). A language is regular if and only if it can be defined by 
a finite state automaton, and a 2 -adic integer is rational if and only if it can be defined 
by a finite state automaton that ignores its inputs. The identity function corresponds 
to the language L = l(OUl)*，and a simple construction will define an automaton that 
corresponds to the sum，difference, or Boolean combination of the numbers defined by 
any two given automata acting on the sequence X 0 X 1 X 2 ... . Hence L is regular. 

In exercise 115, L is (a) 11*(000*1(0 U 1)* U 0*); (b) 11*(00(00)*1(0 U 1)* U0*). 


117 * Incidentally, the stated language L corresponds to an inverse Gray binary code: 
It defines a function with the property that f(2x)= 〜 /(2x + 1)，and g(f(2x))= 
g(f(2x + 1)) = where g{x) = ㊉ (x 》 1) (see Eq. 7.2.1.1-(g)). 

118 . If x = (x n 一 i • •. x\Xq)2 and 0 < < 2 J for 0 < j < n, we have a j x j = 

j=o ( a j 二（无 & 2- 7 )). Take aj = to get a; 》 1. 
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Conversely, the following argument by M. S. Paterson proves that monus must be 
used at least n — 1 times: Consider any chain for f(x) that uses addition，subtraction, 
bitwise Booleans，and k occurrences of the “underflow” operation y<iz = {2 n — z\. 

If A: < n—1 there must be two n-bit numbers x and x" such that x mod 2 = x 11 mod 2 = 
0 and such that all k of the <’s yield the same result for both x and x" • Then 
f(x,) mod = f{x n ) mod when j = p(x f ® x n ). So f(x) is not the function x 1. 

119. ^ x 0 y, / •(— 2 P & 5 — 1). (See ( 90 ).) 

120. Generalizing Corollary W. these are the functions such that /(xi，• • • ， 三 
f(yi ,..., ym) (modulo 2 k ) whenever Xj = yj (modulo 2 k ) for 1 < j < m，for 0 < A: < n. 
The least significant bit is a binary function of m variables, so it has 2 2rn possibilities. 
The next-to-least is a binary function of 2m variables, namely the bits of (x\ mod 4, 
… ,Xm mod 4), so it has 2 22m ; and so on. Thus the answer is 22 m + 2 2 m +---+ 2 nm > 

121. (a) If / has a period of length pq : where g > 1 is odd，its p-fold iteration /[ p ] has a 
period of length q. say yo 4 yi 4 ••• 4 y q = yo where y j+1 = / [p] (yj) and yi ^ y 0 . But 
then, by Corollary W，we must have yo mod 2 n_1 ^ yi mod 2 n — 1 H … H y q mod 2 n ~ 1 
in the corresponding (n — l)-bit chain. Consequently yi = yo (modulo 2 n ~ 1 ), by 
induction on n. Hence y 1 = y 0 ^ 2 n-1 ，and y 2 = yo, etc., a contradiction. 

(b) x\ = xo -\- xo, X 2 = xo (p — 1 )^ X 3 = x\ I a period of length p starts with 
the value xo = (1 + 2 P + 2 2p + • • •) mod 2 n . 

122. Subtraction is analogous to addition; Boolean operations are even simpler; and 

constants have only one bit pattern. The only remaining case is x r = Xj 》 c，where we 
have SV = + c; the shift goes left when c < 0. Then V pqr = V( p + C )(g+ C )j, and 


& L 2P _ = ((A & 1_2 P+C - 2 朴 C 」） >c) & (2 n - 1). 

Hence |X pgr | < |I( p+c )( g + c )j | < Bj = B r by induction. 

123 . If x = (xg^i ... to) 2 ，note first that t = 2 9 ~ 1 {xo . .. x g -i)29 in ( 104 ); hence y = 
(xo ... x g ^i )2 as claimed. Theorem P now implies that broadword steps are 

needed to multiply by a 9 +i and by a g -\. At least one of those multiplications must 
require lgp」or more steps. 


124. Initially t ^ 0, xo = Uo = {1,2,..., 2 n ~ 1 }, and l 7 0. When advancing 
t t + 1 ， if the current instruction is rj 士 rfc we simply define xt = xy ± xy and 

i 1 t. The cases rj o and 4— c are similar. 

If the current instruction branches when ri < rj. define xt = xt-i and let Vi = 
{x G Ut^i I < Xj/}, Vo = Ut \Vi. Let Ut be the larger of Vo and Vi； branch if 
Ut = Vi. Notice that \Ut\ > \Ut^i |/2 in this case. 

If the current instruction is ri 4 — rj^rk ： let W = {x G Ut^i \ [2 lg n+s —2 s 」# 0 

for some s G and note that \W\ < | lgn < Let V c = {x G Ut-i \ VF 

x k / = c} for |c| < n, and V n = Ut—i \ VF \ U| c |<n Lemma B tells us that at most 


By + 1 < 2 2 卜 1 - 1 + 1 of the sets V c are nonempty. Let Ut be the largest; and if it is T4, 
define xt = xy 》 c, i’ t t. In this case \Ut \ > (\Ut-i \ — 2 卜 1 +e+ ’)/(2 2 * _1 _1 + 1). 

Similarly for n M[rj mod2 m ] or M[rj mod 2 m ] let W = {x ^ Ut-i \ 

x & [2 m+s — 2 S J ^ 0 for some s G 5j/}, and \4 = {x G Ut-i \ W \ xy mod 2 m = z}, 
for 0 < z < 2 m . By Lemma B, at most By < 2 2t_1 - 1 of the sets V z are nonempty; let 
Ut = V z be the largest. To write Vi in M[z]^ define xt = xt-i^ z n 4— z 7 ; to read Vi from 
M[z], set i ^ t and put xt = x z ff if z 11 is defined, otherwise let xt be the precomputed 
constant M[z]. In both cases \Ut \ > {\Ut-i \ — 2 t-1 m)/2 2t__1 一 1 is sufficiently large. 
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If t < f we cannot be sure that r\ = px. The reason is that the set W = 
{x G Ut I x & |_2 lg n+s — 2 s 」# 0 for some s G } has size \W\ < |5i/1 lgn < 2 t+e+ ^, 
and \U t \W\ > 2^ e+f -^+^- 2*+ e +/ > 2 2 '- 1 > \{x^ & [2 lgri - lj I 吻 G U t \W}\. Two 
elements oi Ut \W cannot have the same value of px = Sz [2 lg n — 1」. 

[The same lower bound applies even if we allow the RAM to make arbitrary 
2 2 亡 _1 一 way branches based on the contents of (ri,..., r/) at time t.] 

125. Start as in answer 124. but with Uo = [0 .. 2 9 ). Simplifying that argument by 
eliminating the sets W will yield sets such that \Ut\ > 2 5 / max(2 m , 2n) t ; for example, 
at most 2 n different shift instructions can occur. 

Suppose we can stop at time t with t < [lg(/i + 1)」. The proof of Theorem P 
yields p and q with x R & [2 P — 2 q \ independent of x & [2 P+S — 2 q+s \. Hence the hinted 
extension of Lemma B shows that x R takes on at most 2 2t_1 < different values, 

for every setting of the other bits {x & \2 P+S — 2 9+s 」| s G S"*}. Consequently r\ = 
can be the correct value of x R for at most 2( h-1 )/ 2+9 — h values of x. But 2^ h ^ 1 ^ 2+9 ~ h 
is less than \Ut\, by ( 106 ). 

126. M. S. Paterson has proposed a related (but different) conjecture: For every 2-adic 
chain with k addition-subtraction operations, there is a (possibly huge) integer x with 
vx = A: + 1 such that the chain does not calculate 2 Xx • 


127. Use exercise 110 to compute [(1 《 (z/x)) Sz a^O] for a suitable constant a. (The 
special case vx — n may also need to be handled separately.) 

128* The weaker lower bound ^(log log n) follows from Theorem R. because px = 
v(x — 1 ) when x — x x — 

129* Note that the suffix parity function x® is considered in exercises 36 and 117. 

130. If the answer is “no，” the analogous question with variable a suggests itself. 

131. This program does a typical “breadth-first search,” keeping LINK(q) = r. Regis¬ 
ter u is the vertex currently being examined; v is one of its successors. 


OH LDQU r ， q，link 
SET u,r 
1H LDOU a ， ii，arcs 
BZ a ， 4F 


1 

1 

网 

R 


r 卜 LINK(q). 
u r. 

- ARCS(u). 


a 


Is S[u] = 0? 


STQU v ， q，link 
STQU r ， v，link 
SET q，v 
3H PBNZ a,2B 


i?|-|Q| 

R\-\Q\ 

R\-\Q\ 

s 


LDOU 

v ， a，tip 

S 

v - 

TIP(a). 

4H LDOU 

u,u,link 

R 

LDQU 

a,a,next 

s 

a - 

(- NEXT (a). 

CMPU 

t ， u，r 


LDOU 

t,v,link 

s 

t - 

(- LINK(v). 

PBNZ 

t，lB 

网 

PBNZ 

t ， 3F 

s 

Is 

v G i?? 





LINK(q) 4 - v. 
LINK(v) r. 
q 4- v. 

Loop on a. 
m LINK(u). 
Is u # r? 
Continue if so. 

I 


132* (a) We always have t(U) C = cr{U). And equality holds if and only if 

2 U C p{u) for all G C/ and u G U. 

(b) WeVe proved that t(U) C hence T C. U. And if t G T we have 2 l C p u 

for all u ^ U. Therefore cr(T) C r(T). 

(c) Parts (a) and (b) prove that the elements of C n represent the cliques. 

(d) li u C. v then u^zpk C. v&:pk and u&^Sk C vSzSk] so we can work entirely with 
maximal entries. The following algorithm uses cache-friendly sequential (rather than 
linked) allocation, in a manner analogous to radix exchange sort (Algorithm 5.2.2R). 

We assume that W\ .. .w s is a workspace of s unsigned words，bounded by i(；o = 0 
and Ws-^i = 2 n — 1. The elements of C'^_ 1 appear initially in positions w\ ... Wm, and 
our goal is to replace them by the elements of . 

Ml. [Initialize.] Terminate if pk = 2 n — 1. Otherwise set I? z 1, j m. 
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M2. [Partition on v.] While Wi Sz v = 0, set z z + 1. While set 

j j — 1. Then if i > j，go to M3; otherwise swap Wi Wj, set z z + 1, 
j j — 1. and repeat this step. 

M3. [Split Wi … Wm.\ Set l<—j,pi—s-\-l. While i < m, do subroutine Q with 
u = Wi and set z z + 1. 

M4. [Combine maximal elements.] Set m *<— /. While p < s. set m m + 1 ， 
Wm 4 - w p , and p <— p 1. | 

Subroutine Q uses global variables j ， k ， I ， p, and v. It essentially replaces the word u 
by u = u Sz pk and u" = u Sz Sk^ retaining them if they are still maximal. If so, u goes 
into the upper workspace w p .. .w s but u" stays below. 

Ql. [Examine u f .] Set w u Sz pk and g s. If w = u. go to Q4. 

Q2. [Is it comparable?] If g < p，go to Q3. Otherwise if w w q = go to Q7. 
Otherwise if wSzw q = w q . go to Q4. Otherwise set q t q — 1 and repeat Q2. 

Q3. [Tentatively accept u’•] Set p p— 1 and w p w. Memory overflow occurs 
if p < m + 1. Otherwise go to Q7. 

Q4. [Prepare for loop.] Set r p and w p ^i <— 0. 

Q5. [Remove nonmaximals.] While w \ w q ^ w, set q ^ q — 1. While w \ w r = 
set r r + 1- Then if g < r，go to Q6; otherwise set w q w r , w r 0, 

l ， r4—r — 1， and repeat this step. 

Q6. [Reset p.] Set w q 4 — w and p i— q- Terminate the subroutine ii w = u. 

Q7. [Examine u 11 .] Set w ^ u Sz v. If w = w q for some q in the range 1 < g < j, 
do nothing. Otherwise set /•<—/ + 1 and wi w. | 

In practice this algorithm performs quite well; for example, when it is applied to the 
8 x8 queen graph (exercise 7-129), it finds the 310 maximal cliques after only 57283 
mems of computation, using 397 words of workspace. It finds the 10188 maximal 
independent sets of that same graph after about 26 megamems，using 15090 words; 
there are respectively (728, 6912, 2456, 92) such sets of sizes (5, 6, 7, 8)，including the 92 
famous solutions to the eight queens problem. 

Reference: N. Jardine and R. Sibson ，Mathematical Taxonomy (Wiley, 1971), Ap¬ 
pendix 5. Many other algorithms for listing maximal cliques have also been published. 
See, for example, W. Knodel, Computing 3 (1968). 239-240, 4 (1969). 75; C. Bron 
and J. Kerbosch ，CACM 16 (1973) ， 575—577; S. Tsukiyama, M. Ide ， H. Ariyoshi, and 
I. Shirakawa, SICOMP 6 (1977), 505—517; E. Loukakis ，Computers and Math, with 
Appl. 9 (1983) ， 583-589; D. S. Johnson, M. Yannakakis. and C. H. Papadimitriou, Inf. 
Proc. Letters 27 (1988) ， 119-123. See also exercise 5-23. 

133. (a) An independent set is a clique of G; so complement G. (b) A vertex cover is 
the complement of an independent set; so complement G. then complement the outputs. 

134. a ^ 00， 6 01, c ^ 11 is the first mapping of class II. 

135. The unary operators are simple: -^(xix r ) = x r xi\ o(xix r ) = x r x r \ u(xix r ) = xixi. 

And X[X r yiy r = (zi V z r )(zi A z r ). where ㊉ and 2 ： r = ㊉ y r . 

136* (a) Classes II ， III ， IV a . and IV C all have the optimum cost 4. Curiously the 
functions zi = xiWyiW (x r /\y r ). z r = x r \Zy r work for the mapping (a. 6, c) ^ (00, 01 ， 11 ) 
of class II as well as for the mapping (a, 6, c) 4 (00, 01 ， 1*) of class IV C . [This operation 
is equivalent to saturated addition, when a = 0, 6 = 1， and c stands for “more than 1.”] 
(b) The symmetry between a ，6 ， and c implies that we need only try classes I ， 
IV a . and V a ; and those classes turn out to cost 6 ， 7, and 8. One winner for class I， with 
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(a, 6 , c) ^ (00, 01 ， 10), is zi = v r A ui^ z r = vi A u V) where m = xi ^ yi^ u r = x r ® y r , 
vi = y r ^ m. and v r = yi ❼ u r . [See exercise 7.1.2—60, which gives the same answer but 
with zi ^ z r . The reason is that we have (x -\- y -\- z) mod 3 = 0 in this problem but 
(x y — z) mod 3 = 0 in that one; and zi z r is equivalent to negation. The binary 
operation z = x o y in this case can also be characterized by the fact that the elements 
(x ， y, z) are all the same or all different; thus it is familiar to people who play the game 
of SET. It is the only binary operation on n-element sets that has n! automorphisms 
and differs from the trivial examples x o y = x or x o y = y.] 

(c) Cost 3 is achieved only with class I: Let (a ， 6 ， c) 4 (00, 01 ， 10) and zi = 
{xi V X r ) A yi^ z r = x r A y r - 

137. In fact ， 2 ： = (x 1) Sz y when (a ， 6 ， c) 4 (00, 01 ， 10). [It’s a contrived example.] 

138* The simplest case known to the author requires the calculation of two binary 
operations, such as 



each has cost 2 in class V a ，but the costs are (3,2) and (2,3) in classes I and II. 

139. The calculation of Z 2 is essentially equivalent to exercise 136(b); so the natural 
representation (m) wins. Fortunately this representation also is good for Zi, with 
zn = xi A yi ^ Z\ r = x r A yr • 

140* With representation (m). first use full binary adders to compute (aiao )2 = 
xi yi zi and ( 6160)2 = + y r + 2 ： r in 5 + 5 = 10 steps. Now the “greedy footprint” 

method shows how to compute the four desired functions of (ai ， ao, 61 ， 60 ) in eight 
further steps: ui = ai A 60 , u r = ao A 61 ； t\ = ai ㊉ 60， ^2 = ao ㊉ 6 i, 亡 3 = ai ㊉ 亡 2 , 
= ao ㊉ 亡 1 ，切 = 亡 3 八 [ 1 ，Wr = 亡 4 八 [Is this method optimum?] 

141. Suppose we’ve computed bits a = aoai … ci 2 m-i and b = bobi ... b 2 m-i such that 

a s = [s = 1 or s = 2 or s is a sum of distinct Ulam numbers < m in exactly one way], 
b s = [s is a sum of distinct Ulam numbers < m in more than one way], 

for some integer m = U n > 2 . For example, when m = n = 2 we have a = 0111 and 
b = 0000. Then {s \ s < m and a s = 1 } = {t/i ， … ， U n }; and U n +i = min{s \ s > m 
and a s = 1}. (Notice that a s = 1 when s = [/ n _i + U n .) The following simple bitwise 
operations preserve these conditions: n n + 1 , m •(— U n , and 

(^m • • • 0^2m—l ， bm • • • ^2m — 1) ^~ ((^m • • • ^2m —1 © ^0 • • • ^m —l) & • • • ^2m— 1 ^ 

{cim • • • (^2m—l & ^0 • • • ^m—l) | • • • ^2m — 1) - 

where a s = b s = 0 for 2U n —\ < s < 2U n on the right side of this assignment. 

[See M. C. Wunderlich. BIT 11 (1971) ， 217-224; Computers in Number Theory 
(1971) ， 249-257. These mysterious numbers, which were first defined by S. Ulam in 
SIAM Review 6 (1964) ， 348. have baffled number theorists for many years. The ratio 
U n !n appears to converge to a constant, ^ 13.52; for example. U 20000000 = 270371127 
and t /40000000 = 540752349. Furthermore, D. W. Wilson has observed empirically that 
the numbers form quasi - periodic “clusters” whose centers differ by multiples of another 
constant, ^ 21.6016. Calculations by Jud McCranie and the author for U n < 640000000 
indicate that the largest gap U n — U n —i may occur between t /24576523 = 332250401 and 
t /24576524 = 332251032; the smallest gap U n — U n 一 i = 1 apparently occurs only when 
U n G {2, 3,4, 48}. Certain small gaps like 6 ， 11 ， 14. and 16 have never been observed.] 
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142. Algorithm E in that exercise performs the following operations on subcubes: 
(i) Count the *s in a given subcube c. (ii) Given c and c . test if c C c . (iii) Given 
c and c 7 , compute c LI c (if it exists). Operation (i) is simple with sideways addition; 
let’s see which of the nine classes of two-bit encodings ( 119 ) ， ( 123 ), ( 124 ) works best 
for (ii) and (iii). Suppose a = 0 ， 6=l，c = *; the symmetry between 0 and 1 means 
that we need only examine classes I, III ， IV a ， IV C , V a ，and V c . 

For the asterisks-and-bits mapping (0 ， 1，*) ^ (00,01,10)，which belongs to 
class I， the truth table for c ^ c is 010*100*110***** in each component. (For example, 
0 C * and * 2 1. The *s in this truth table are don’t-cares for the unused codes 11.) 
The methods of Section 7.1.2 tell us that the cheapest such functions have cost 3; 
for example, c C c ii and only if ((6 ® b f ) \ a) a = 0. Furthermore the consensus 
cU c = c n exists if and only \i vz = 1, where 2 ： = ( 6 ㊉ 6 ’) & 〜 (a ㊉ a’). And in that 
case, a" = (a ㊉ b ㊉ b’）Sz 〜 （a ㊉ a 7 ), b n — (b \ b 1 ) Sz z. [The asterisk and bit codes were 
used for this purpose by M. A. Breuer in Proc. ACM Nat. Conf. 23 (1968) ， 241—250. 

But class III works out better, with (0, 1， *) 4 (01 ， 10, 00). Then c C c 7 if and only 
if (c/&cj) I (c r ^LC r ) = 0 ; cUc = c 11 exists if and only \iuz = 1 where 2 : = x = c t \ 

y = c r \c r ] and c/ = x ® 2 :, = y ® 2 ：. We save two operations for each consensus, 

with respect to class I， compensating for an extra step when counting asterisks. 

Classes IV a ， V a ，and V c turn out to be far inferior. Class IV C has some merit, 
but class III is best. 

143. f(x) = ((x&mi)*Cl7) I ((x>17)&mi) | ((x&m 2 )《15) | ((x>15)&m 2 ) | ((x&m 3 )<C 

10) I ((x 》 10) & m 3 ) I ((x & m4) 《 6 ) I ((x 》 6 ) & m。, where mi = # 7f 7f 7f 7f 7f 7f, 
m 2 = # f ef ef ef ef ef e, m 3 = # 3f 3f 3f 3f 3f 3f 3f , = # f cf cf cf cf cf cf c. [See, for 

example, Chess Skill in Man and Machine, edited by Peter W. Frey (1977), page 59. 
Five steps suffice to compute f(x) on MMIX (four M0R operations and one OR), since 
f(x) = q • x • q’ \q - x-q with q = # 40a05028140a0502 and q = #2010884422110804」 

144. Node j •㊉ (A: 《 1)，where k = j Sz —j. 

145. It names the ancestor of the leaf node j | 1 at height h. 

146. By ( 136 ) we want to show that A(j & —i) = pi when l — 2 pl < z < / < j < / + 2 pl . 
The desired result follows from ( 35 ) because —/ < —z < —/ + 2 p/ . 

147* (a) 7TVj = f3vj = j, avj = 1 《 pj•' and rj = A. for 1 < j < n. 

(b) Suppose n = 2 ei +• • -+2 et where ei > • • • > > 0, and let — 2 ei +• ••- \-2 ek 

for 0 < k < t. Then nvj = j and /3vj = avj = rik for nk — 1 < jf < rik- Also rrik = ^n k _ 1 

for 1 < A: < where I? 。 = 八 ； all other rj = A. 

148. Yes, if 7 ryi = 010000, ixy 2 = 010100, ttx 1 = 010101 ， ttx 2 = 010110, ttx 3 = 010111 ， 
/3x 3 = 010111 ， /3y 2 = 010100, /3x 2 = 011000, pyx = 010000, and /3xi = 100000. 

149. We assume that CHILD(i?) = SIB(^) = PARENT(i?) = A initially for all vertices v 
(including v = A), and that there is at least one nonnull vertex. 

51. [Make triply linked tree.] For each of the n arcs u v (perhaps v = A), set 
SIB(ii) 4 - CHILD (?;), CHILD (r) m，PARENT (ii) 4 - 队 (See exercise 2.3.3— 6 .) 

52. [Begin first traversal.] Set p <— CHILD (A), n <— 0, and A0 < - 1. 

53. [Compute /3 in the easy case.] Set n n + 1, 7rp n, 丁 n 八 ， and 
An 1 + A(n 》 1). If CHILD(p) ^ A, set p CHILD(p) and repeat this step; 
otherwise set f3p n. 

54. [Compute r, bottom-up.] Set r/3p PARENT (p). Then if SIB(p) _ 八 ， set 
p SIB(p) and return to S3; otherwise set p 4 — PARENT(p). 
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55. [Compute /3 in the hard case.] If p A. set h A(n & — 7 rp)，then f3p *<— 
((n ^ h) I 1) 《 /i，and go back to S4. 

5 6 . [Begin second traversal.] Set p *<— CHILD (A), AO •<— An ，/3A *<— aA *<— 0. 

57. [Compute a, top-down.] Set ap a (PARENT (p)) | (/3p & —/3p). Then if 

CHILD (p) A, set p CHILD (p) and repeat this step. 

5 8 . [Continue to traverse.] If SIB(p) ^ A, set p SIB(p) and go to S7. 
Otherwise set p 4 — PARENT (p), and repeat step S 8 if p 7 ^ A. | 

150. We may assume that the elements Aj are distinct, by regarding them as ordered 
pairs The hinted binary search tree, which is a special case of the “Cartesian 

trees” introduced by Jean Vuillemin [CACM 23 (1980)，229-239], has the property that 
is the nearest common ancestor of i and j. Indeed，the ancestors of any given 
node j are precisely the nodes k such that is a right-to-left minimum of _Ai ... Aj 
or Ak is a left-to-right minimum of Aj ... A n . 

The algorithm of the preceding answer does the desired preprocessing, except 
that we need to set up a triply linked tree differently on the nodes {0. 1 ， … ,n}. Start 
as before with CHILD(i?) = SIB(iO = PARENT(?;) = 0 for 0 < v < n, and let A = 0. 
Assume that Ao < Aj for 1 < j < n. Set t 0 and do the following steps for v = 
n — 1， .... 1: Set 0; then while A v < At set u ^ t and t PARENT (t). If # 0, 
set SIB(i0 SIB (u) , SIB(ii) 0， PARENT (u) v. CHILD (i?) u\ otherwise simply 
set SIB(?;) — CHILD (t). Also set CHILD (t) v, PARENT (?;) 

Continue with step S2 after the tree has been built. The running time is O(n), 
because the operation t 4 — PARENT (t) is performed at most once for each node t. [This 
beautiful way to reduce the range minimum query problem to the nearest common 
ancestor problem was discovered by H. N. Gabow. J. L. Bentley, and R. E. Tar j an， 
STOC 16 (1984)，137—138, who also suggested the following exercise.] 

151* For node v with k children ui : ..., Uk, define the node sequence S (i?) = v if 
k = 0] 5 ( 1 ；) = if A: = 1; and S(v) = S(ui)v ... vS(Uk) if k > 1. (Consequently 

v appears exactly max(A: —1.1) times in (?;).) If there are k trees in the forest, rooted at 
ui ， • • • ， life，write down the node sequence S (iii)A ... AS(uk) = Vi .. . Vat. (The length 
of this sequence will satisfy n < N < 2n.) Let Aj be the depth of node V}，for 1 < 
j < 7V，where A has depth 0. (For example, consider the forest ( 141 )，but add another 
child K ^ D and an isolated node L. Then Vi...Vi 5 = CFAGJDHDKABEIAL 
and Ai ... A 15 = 231342323012301.) The nearest common ancestor of u and v, when 
u = Vi and v = Vj^ is then Vk(ij) in the range minimum query problem. [See J. Fischer 
and V. Heun, Lecture Notes in Comp. Sci. 4009 (2006). 36-48.] 

152. Step VI finds the level above which ax and ay have bits that apply to both of 

their ancestors. (See exercise 148.) Step V2 increases if necessary, to the level where 
they have a common ancestor, or to the top level An if they don’t (namely if A: = 0). 
If /3x /3z, step V4 finds the topmost level among x’s ancestors that leads to level h; 

hence it knows the lowest ancestor x for which fix = /3z (or x = A). Finally in V5, 
preorder tells us which of x or y is an ancestor of the other. 

153. That pointer has pj bits, so it ends after pi + p2 + • • • + pj = j — u j bits of the 
packed string，by ( 61 ). [Here j is even. Navigation piles were introduced in Nordic 
Journal of Computing 10 (2003)，238-262.] 

154. The gray lines define 36°-36°-90° triangles, ten of which make a pentagon with 
72° angles at each vertex. These pentagons tile the hyperbolic plane in such a way 
that five of them meet at each vertex. 
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155. Observe first that 0 < (aO)i /0 < (/> 一 1 + (/) 一 3 + (j)~ 5 + • • • =1 ， since there are no 
consecutive Is. Observe next that F 一 n 小三小 _ n (modulo 1)，by exercise 1.2.8-11. Now 
add Fk ± (/>+•• •- \-Fk r (f>. For example, (4(/)) mod 1 = </) 一 5 + 彡一 2 ; (—2(/>) mod 1 = (/> 一 4 +</) 一 1 • 

This argument also proves the interesting formula L_/V(a)(/)」= —N(aO). 

156. (a) Start with y 0, and with k large enough that \x\ < Fk+i. If x < 0. set 
k <— (A: — 1) I 1, and while x + > 0 set k i— k — 2; then set y y + (1 《 A:)， 
x x - Ffc+i; repeat. Otherwise if x > 1， set A: A: & —2，and while x — Fk < 0 set 
k i— k — 2] then set y i— y (1 ^ A:), x i— x — Ffc+i; repeat. Otherwise set y y -\- x 
and terminate with y = (a)2. 

(b) The operations x\ ^ ai, yi —ai, Xk yk-i + ctk, yk Xk^i — Xk 
compute Xk = N ... a^) and yk = N[a\ ... a^O). [Does every broad word chain for 
N(ai ... a n ) require Q(n) steps?] 

157. The laws are obvious except for the two cases involving (a—). For those we have 
7V((a—)0 fc )= 7V(a0 fc ) + F-k -2 for all A: > 0, because decrementation never ‘‘borrows’’ 
at the right. (But the analogous formula 7V((a+)0 fc )= N (€^)+ F 一 k_i does not hold.) 

158. Incrementation satisfies the rules (a00)+ = aOl. (alO) + = (a+)00，（al)+ = 
(a+)0. It can be achieved with six 2-adic operations on the integer x = (a)2 by setting 
y x I (x 》 1) ， 2 : l y & 〜 (y + 1)，x l (x I 2 :) + 1. 

Decrementation of a nonzero codeword is more difficult. It satisfies (al0 2fc )—= 
a0(10) fc ，(al0 2fc+1 )— = a(01) fc+1 ; hence by Corollary I it cannot be computed by a 
2-adic chain. Yet six operations suffice, if we allow monus: y x — 1, z y&x， 
w ^ z /io, x^y — w-\-(w — [z — w)). 

159. Besides the Fibonacci number system (146) and the negaFibonacci number sys¬ 
tem (147)，there’s also an odd Fibonacci number system: Every positive integer x can 
be written uniquely in the form 


x = Fi x + Fi 2 + • • • + Fi s ， 


where Zi 今 Z 2 > … L > 0 and l s is odd. 


Given a negaF ibonacci code a, the following 20-step 2-adic chain converts x = { 0^)2 to 
y = (/3)2 to z = (7)2，where f3 is the odd codeword with N(a) = F(/3) and 7 is the 
standard codeword with F(J3) = F^O): x + x & /io, 尤一 —尤㊉ x + ; d x + — x ~; 
t ^ d\x~ ^ t ^ th 〜 (t 《 1); y t (d fl 0 ) ㊉ f ㊉ （(t & x~) 》 1); z (y + 1) 》 1; 
w I 2 ： ㊉ (4/io )； t ^ W k, ~(?i ； + l); 2 ： 2 ： ㊉ (t & (2 ： ㊉ ((W + l) 》 1))). 

Corresponding negaF ibonacci and odd representations satisfy the remarkable law 

F kl +m H - h F kr+rn = (-l) m (F Zl ^ m H - h F ls - m ), for all integers m. 

For example, if N(a) < 0 the steps above will convert x = (a0)2 to y = (/3)2，where 
F((/3 》2)0) = —N(a). Furthermore /3 is the odd code for negaFibonacci a if and only 
if a R is the odd code for negaF ibonacci /3 R . when \a\ = |/3| is odd and N(a) > 0. 

No finite 2-adic chain will go the other way，by Corollary I. because the Fibonacci 
code 10 fc corresponds to negaF ibonacci 10 fc+1 when k is odd, (10) fc / 2 l when k is even. 
But if 7 is a standard Fibonacci codeword we can compute y = (^)2 from 2: = (7)2 by 
setting y — 2：《l，t4— — 1)& /io, y y — t [t^： 0]((t— 1) & /io)- And then 

the method above will compute a R from /3 R . The overall running time for conversion 
to negaF ibonacci form will then be of order log 卜|， for two string reversals. 

160. The text’s rules are actually incomplete: They should also define the orientation 
of each neighbor. Let us stipulate that a S n = OL eri = a; (a0) i/ ；n = (al) wo = «1; 

(aOO)ns = aOO，(alO)ntx； = alO， (al)ne = al; (a0) oo = 0^0， (alOl)oo = alOl^ 
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(alOOl)oo = c^lOOl. (aOOOl)otx ； = c^OOOl. Then a case analysis proves that all cells 
within d steps of the starting cell have a consistent labeling and orientation, by induc¬ 
tion on the graph distance d. (Note the identity a- {- = ((aO) —) 》 1.) Furthermore the 
labeling remains consistent when we attach y coordinates and move when necessary 
from one strip to another via the (5-rules of ( 153 ). 

161. Yes, it is bipartite, because all of its edges are defined by the set of boundary 
lines. (The hyperbolic cylinder cannot be bicolored; but two adjacent strips can.) 

162. It’s convenient to view the hyperbolic plane through another lens. / 

by mapping its points to the upper halfplane ^sz > 0. Then the “straight 。 々'、 / 

\ '、、 r \ ^ 

lines” become semicircles centered on the x-axis，together with vertical j 
halflines as a limiting case. In this representation, the edges 丨之一 1| = y/2. j / \ 

卜 | = r，and Ikz = 0 define a 36 。 - 45 。 -90。 triangle if r 2 = (/) + Every b 4^- - 4 4 

triangle ABC has three neighbors CBA 1 . ACB 1 . and BAC 1 ^ obtained ' ,, 

by “reflecting” two of its edges about the third, where the reflection of 

| 2 ： — c ; | — r about \z — c\ = r is \z — c — | + 奶 ) | = | \x\ — ^ 2 1, Xj = r 2 / (c 士 r’ 一 c). 

The mapping 2 ： ^ (z — zo)/(z — zo) takes the upper halfplane into the unit circle; 
when zo = — 1/ + 5" 4 <) the central pentagon will be symmetric. Repeated 

reflections of the initial triangle, using breadth-first search until reaching triangles that 
are invisible, will lead to Fig. 14. To get just the pentagons (without the gray lines), 
one can begin with just the central cell and perform reflections about its edges, etc. 

163. (This figure can be drawn as in exercise 162, starting with vertices that project to 
the three points zr, zrcj, and ircu 2 ， where r 2 = ^ (1 + y/2) (4 — y/2 — \/6) and oj = e 2?r " 3 . 
Using a notation devised by L. Schlafli in 1852, it can be described as the infinite tiling 
with parameters {3. 8}，meaning that eight triangles meet at every vertex; see Schlafli’s 
Gesammelte Mathematische Abhandlungen 1 (1950), 212. Similarly, the pentagrid and 
the tiling of exercise 154 have Schlafli symbols {5,4} and {5,5} ， respectively.) 

164. The original definition requires more computation, even though it can be factored: 




=e 


2ttz/3 
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custer’ ㈤ =X ^ 〜 (: K N & Y s ), 




But the main reason for preferring ( 157 ) is that it produces a thinner, kingwise con¬ 
nected border. The rookwise connected border that results from the 1957 definition is 
less attractive, because it’s noticeably darker when the border travels diagonally than 
when it travels horizontally or vertically. (Try some experiments and you’ll see.) 

165. The first image X ⑴ is the “outer” border of the original black pixels. Fingerprint- 
like whorls are formed thereafter. For example, starting with Fig. 15(a) we get 



in a 120 X 120 bitmap, eventually alternating endlessly between two bizarre patterns. 
(Does every nonempty M X N bitmap lead to such a 2-cycle?) 

166* If X = custer(X), the sum of the elements of X+(X^1) + (X^1) + (X^1) + (X^1) 
is at most 4MN + 2M + 27V ， since it is at most 4 in each cell of the rectangle and at 
most 1 in the adjacent cells. This sum is also five times the number of black pixels. 
Hence f(M : N) < ^MN + |M + | 见 Conversely we get f(M^N) > |M7V — | by 
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letting the pixel in row i and column j be black unless (i + 2j) mod 5 = 2. (This 
problem is equivalent to finding a minimum dominating set of the M X N grid.) 


167* (a) With 17 steps we can construct a half adder and three full adders (see 7.1-2— 
( 23 )) so that (2:12:2)2 = x NW +x w -\-x S w, (2:32:4)2 = x N +x s ， (z 5 z 6 )2 = ^ne + +$se ， 

and (2:72:3)2 = Z2 + Z4 -h Z 6 . Then / = *Si 2:3, 之 5, 之 7) A (xV where the symmetric 
function /1 needs seven operations by Fig. 9 in Section 7.1.2. [This solution is based 


on ideas of W. F. Mann and D. Sleator. 


(b) Given x_ = x = and x + = compute a x~ Sz x+ (= Z3), 

X 一 ㊉ ； (=2:4 夂 Cf-X ㊉ C 》1 (=2 ： 6) ， Cf-C《l (=2 ： 2),ef-C ㊉ C &4 

f<—fjc (= Z7), e 6 ㊉ e (= 2 8 )， c<—x&:b^c<—cja^ b <— c 1 (= 2:5), 
c 卜 c 》 1 (= z\\ (i — 6&c，c — 6|c ， 6—a&/，/ — a|/，/ — d|/ ， c—6|c ， 
/ I / © C (= Si(z 1 ,Z3,z 5 ,z 7 )), e ^ e \ x, / & e. 

[For excellent summaries of the joys and passions of Life, including a proof that 
any Turing machine can be simulated, see Martin Gardner, Wheels. Life and Other 
Mathematical Amusements (1983)，Chapters 20-22; E. R. Berlekamp, J. H. Conway, 
and R. K. Guy, Winning Ways 4 (A. K. Peters. 2004)，Chapter 25.] 


At last / Ve got what I wanted — an apparently unpredictable law of genetics. 

Overpopulation, like underpopulation, tends to kill. 
A healthy society is neither too dense nor too sparse. 


— JOHN H. CONWAY, letter to Martin Gardner (March 1970) 

168. The following algorithm, which uses four n-bit registers x~ ^ x : x +， and works 
properly even when M = 1 or TV = 1. It needs only about two reads and two writes 
per raster word to transform to X( t+1 ) in ( 158 ): 

Cl. [Loop on k.] Do step C2 for A: = 1. 2, , TV ’； then go to C5. 

C2. [Loop on j.] Set x -Ao/c, and AMk . Then perform 

steps C3 and C4 for j = 0， 1 ， …， M — 1 . 

C3. [Move down.] Set x~ 4— x, x 4— x + . and x + <— (Now x = Ajk^ and 

x— holds the former value of -A(j-i)fc-) Compute the bitwise function values 

y f(x~ 》 l,x— 《 l 5 x 》《 l 5 x + 》 l ， x + ,x+ 《 1). 

C4* [Update Set x i — Aj ^1 ^ & — 2^ y i — y & (2 n — 1 )， 一 i) i — 

x~ (y^> (n- 2 )), Ajk ^ y ~\r (x ~ 《 （n - 2 )). 

C5. [Wrap around.] For 0 < jf < M. set x •<— Ajj^f & —2 n _ 1 -d ， AjN，<— x -\- 
(Aji 》 d)，and Aji <— Aji + (x 《 d)，where d = 1 (N — 1) mod (n — 2). | 

An M X N torus is equivalent to an (M — 1) X (TV — 1) array surrounded by zeros, 
in many cases like ( 157 ) and ( 159 ) and even ( 161 ). For exercise 173 we can clean an 
(M — 2) X (TV — 2) array that is bordered by two rows and columns of zeros. But Life 
images (exercise 167) can grow without bound; they can’t safely be confined to a torus. 

169. It quickly morphs into a rabbit, which proceeds to explode. Beginning at time 
278， all activity stabilizes to a two-cycle formed from a set of traffic lights and three 
additional blinkers, together with three still lifes (tub ， boat，and bee hive). 

170* If M > 2 and N > 2. the first step blanks out the top row and the rightmost 
column. Then if M > 3 and N > 3. the next step blanks out the bottom row and the 
leftmost column. So in general we’re left after t = min(M, N) — 1 steps with a single 
row or column of black pixels: The first \t/2~\ rows, the last \t/2\ columns, the last 
p/2 」 rows, and the first p/2 」 columns have been set to zero. The automaton will stop 
after making two more (nonproductive) cycles. 
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171. Without ( 160 ): x\ x SE & x N . X 2 <— x N &: x SE ， X 3 <— x E xi : X 4 <— x NE & 元 2 , 

尤 5 — 尤 3 I X 4 . XQ X W & X 5 ， 尤 7 — 尤 1 & X N E ， 尤8 ^7 & 元 NW， ^9 | 尤 SW ， 

X 10 x 8 & 尤 9 ， Xu 4 - x 10 I x 6 , x 12 4- x s & Xn^ x 13 x 2 & 无 E ， X 14 4 - x 13 & 

Xl5 ^r- X N ^ Tne ， X 16 X S W & ， ^17 ^ ^15 | 尤 16, ^18 尤 NE & 尤 SW ， ^19 ^17 

X 2 0 X K I X S E, ^21 ^ X 2 0 \ X S , X 2 2 ^ ^NW & ^ 21 , ^23 尤 22 & ^19, ^24 尤 12 | 尤 14, 

g *<— X 23 I X 2 A- With ( 160 ), set X 4 <— x NE & x N and leave everything else the same. 

172. The statement isn’t quite true; consider the following examples: 




The ‘I’ and at the left show that pixels are sometimes left intact where paths join, 
and that rotating by 90° can make a difference. The next two examples illustrate 
a quirky influence of left-right reflection. The diamond example demonstrates that 
very thick images can be unthinnable; none of its black pixels can be removed without 
changing the number of holes. The final examples, one of which was inspired by the 
answer to exercise 166， were processed first without ( 160 ), in which case they are 
unchanged by the transformation. But with ( 160 ) they’re thinned dramatically. 

173* (a) If X and Y are closed ，X Y is closed; if X and Y are open，X | y is 
open. The hinted statement follows. Furthermore X DD = X D , because X D is closed; 
similarly X LL = X L . (In fact we have X L = 〜 ( 〜 X) D ， because the definitions are dual, 
obtained by swapping black with white.) Now X DL C X D ^ so X DLD C X DD = X D . 
Dually, X L C X LDL • We conclude that there’s no reason to launder a clean picture: 

_ X DL 

(b) We have = (X\X W \ X NW | X N ) & (X | X N | X NE | X E ) & (X | X E | X SE | X s ) & 

(X I X s I Xsw I X w )- Furthermore, in analogy with answer 167(b)，this function can be 
computed from x~ , and x + in ten broadword steps: f i— x \ (x 》 l) | (($ 一 | 一》 1 ))& 
{x + I {x + 》 l))) ， /*f- / & (/ 《 1). [This answer incorporates ideas of D. R. Fuchs.] 

To get X L , just interchange | and &. For further discussion, see C. Van Wyk 
and D. E. Knuth. Report STAN-CS-79-707 (Stanford Univ. ， 1979), 15-36.] 

174. Three-dimensional digital topology has been studied by R. Malgouyres ， Theoret¬ 
ical Computer Science 186 (1997) ， 1—41. 

175* There are 25 in the outline，2 + 3 in the eyes, 1 +1 in the ears, 4 in the nose, and 
1 in the smile，totalling 37. (All white pixels are connected kingwise to the background.) 

176. (a) If v isn’t isolated, there are eight easy cases to consider, depending on what 
kind of neighbor v has in G. 

(b) There’s a vertex w 1 G G 1 adjacent to each vertex of N u U N v . (Four cases.) 

(c) Yes. In fact，by definition ( 161 ). we always have |aS / (^ / )| > 2. 

(d) Let N ! v f = {i? I G N v }. If v is the east neighbor of u , call it either 
u ^ G 01 u s G G; this element is adjacent to every vertex of N f u f U N’ v ，• A similar 
argument applies when v f = u f N . If v 1 = i^ E ， there’s no problem if u 1 G G. Otherwise 
u w G G, G G，and either ^ ^ or ^ G; hence N^, U N^, is connected in G. 
Finally if v = the proof is easy if u s G G\ otherwise u ^ G and v G G. 

(e) Given a nontrivial component C of G, with v ^ C and v 1 G S let C 1 be the 
component of G 1 that contains v . This component C 1 is well defined，by (a) and (b). 
Given a component C ! of G’，with v G C 1 and v G aS’(V) ，let C be the component of 
G that contains v. This component C is nontrivial and well defined, by (c) and (d). 
Finally, the correspondence C ^ C f is one-to-one. 
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177* Now the vertices of G are the white pixels, adjacent when they are root-neighbors. 
So we define TV(i j) = {(z, jf), (i—l.j). (z, jf+1)}. Arguments like those of answer 176, 
but simpler, establish a one-to-one correspondence between the nontrivial components 
of G and the components of G 1 . 

178. Observe that in adjacent rows of X*, two pixels of the same value are kingwise 
neighbors only if they are rookwise connected. 

179. The pixels of each row x\ . •. xn can be a runlength encoded” as a sequence of 
integers 0 = c。< ci 〈… < C 2 m+i = TV + 2 so that Xj = 0 for j G [Co • • ci) U [C 2 • • C 3 ) U 
• • • U [c 2 m - - c 2 m+i) and Xj = 1 for j G • • c 2 ) U • • • U [c 2 m-i • - c 2m ). (The number of 
runs per row tends to be reasonably small in most images. Notice that the background 
condition xo = xn+i = 0 is implicitly assumed.) 

The algorithm below uses a modified encoding with aj = 2cj — (jf mod 2) for 
0 < j < 2m + l. For example, the second row of the Cheshire cat has (ci ， C 2 , C 3 ， C 4 , C 5 )= 
(5, 8 , 23, 25. 32); we will use (ai ， a 2 , as) = (9, 16,45, 50, 63) instead. The reason is 

that white runs of adjacent rows are rookwise adjacent if and only if the corresponding 
intervals [aj .. flj+i) and [bk . . &fc+i) overlap, and exactly the same condition charac¬ 
terizes when black runs of adjacent rows are kingwise adjacent. Thus the modified 
encoding nicely unifies both cases (see exercise 178). 

We construct a triply linked tree of components, where each node has several 
fields: CHILD, SIB，and PARENT (tree links); DORMANT (a circular list of all children that 
aren’t connected to the current row); HEIR (a node that has absorbed this one); ROW and 
COL (location of the first pixel); and AREA (the total number of pixels in the component). 

The algorithm traverses the tree in double order (see exercise 2.3.1—18), using 
pairs of pointers (P ， P’），where P 7 = P when P is traversed the first time. P 7 = PARENT (P) 
when P is traversed the second time. The successor of (P ， P’）is (Q ， Q’）= next(P ， P ’)， 
determined as follows : If P = P 7 and CHILD (P) ^ A, then Q Q f <— CHILD(P); otherwise 
Q — P and Q’ — PARENT(Q). If P ^ P 7 and SIB(P) / 八 ， then Q 卜 Q' 4- SIB(P); 
otherwise Q 4 - PARENT (P) and Q 7 PARENT(Q). 

When there are m black runs, the tree will have m +1 nodes, not counting nodes 
that are dormant or have been absorbed. Moreover, the primed pointers P’ 1 ? … ， P; m+ i 
of the double traversal (Pi.Pi), … ， （ P 2 m+i ， P;m+i) are precisely the components of 
the current row, in left-to-right order. For example，in ( 163 ) we have m = 5; and 
(Pi,--. ,P’ii) point respectively to @ ，❽，①，❽， © ， © ， © ， & ，②，❹， (g). 

II- [Initialize.] Set t 卜 1, ROOT 卜 L0C(N0DE(0)) ， CHILD (ROOT ) 卜 SIB (ROOT) 4- 
PARENT(RQQT) DORMANT(ROOT) HEIR(ROQT) A; also RQW(ROOT) 4- 
CQL(RQOT) 0, AREA (ROOT) 4- TV + 2. s 0, a 0 4- 6 0 0, ai 27V + 3. 

12. [Input a new row.] Terminate if s > M. Otherwise set bk dk for A; = 1 ， 2, 

…， until bk = 27V+3; then set bk+i as a “stopper.” Set s s+1. If s > 

M, set ai 2N + 3; otherwise let ai ， … ， a 2 m+i be the modified runlength 
encoding of row s as discussed above. (This encoding can be obtained with 
the help of the p function; see ( 43 ).) Set j k 1 and P 4— P 7 ROOT. 

13. [Gobble up short b’s.] If bk+i > go to 19. Otherwise set (Q ， Q’)*<— 
next(P ， P’) ， (R ， R’）•<— next(Q ， Q’)，and do a four-way branch to (14, 15, 16,17) 
according as 2[Q # Q’] + [R^R 7 ] = (0, 1 ， 2, 3). 

14. [Case 0.] (Now Q = Q’ is a child of P’，and R = R’is the first child of Q’. Node Q 

will remain a child of P’，but it will be preceded by any children of R.) Absorb 
R into P 7 (see below). Set CHILD(Q) SIB(R) and Q 7 CHILD(R). If QV 八， 
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set R 卜 Q’，and while R 7 ^ A set PARENT(R ) 卜 P’ ， R/ 卜 R，R 卜 SIB(R); then 
SIB(R) —Q，Q — Q’. Set CHILD(P) —Q if P = P’ ， SIB(P) —Q if P^P 7 . Go to 18. 

15. [Case 1.] (Now component Q = R is surrounded by P 7 = R’.）If P = P’，set 
CHILD(P) — SIB(Q); otherwise set SIB(P) SIB(Q). Set R 4 - DORMANT(R 7 ). 
Then if R = A, set DORMANT (R 7 ) 4 - SIB(Q) — Q; otherwise SIB(Q) 4 - SIB(R) 
and SIB(R) Q. Go to 18. 

16. [Case 2.] (Now is the parent of both P 7 and R. Either P = P 7 is childless, or 
P is the last child of P 7 .) Absorb R into P 7 (see below). Set SIB(P’ ） 4 — SIB (R) 
and R 卜 CHILD(R). If P = P’，set CHILD(P ) 卜 R; otherwise SIB(P ) 卜 R. 
While R ^ A. set PARENT(R) ^ P 7 and R ^ SIB(R). Go to 18. 

17. [Case 3.] (Node P 7 = Q is the last child of Q’ = R，which is a child of R ’.） 
Absorb P 7 into R 7 (see below). If P = P 7 , set P R. Otherwise set P 7 
CHILD (P’），and while P’ # 八 set PARENT (P’ ） R’ ， P’ 4- SIB(P ’）； also 
set SIB(P) 4 - SIBCQ 7 ) and SIB(Q 7 ) 4 - CHILD(Q). If Q = CHILD(R), set 
CHILD(R) A. Otherwise set R CHILD(R), then R — SIB(R) until 
SIB(R) = Q，then SIB(R) A. Finally set P 7 — R’. 

18. [Advance A:.] Set k k 2 and return to step 13. 

19. [Update the area.] Set AREA(P ’） 卜 AREA(P’）+ \aj /2] — \aj Then go 
back to 12 if a) = 27V + 3. 

110. [Gobble up short a.] Ifa J+ i > 6 ^, go to Ill. Otherwise set Q 卜 LOC(NODE ⑴） 

and t t -\-l. Set PARENT(Q) P 7 , DORMANT(Q) ^ HEIR(Q) A; also 

RQW(Q) ^ s, CQL(Q) 4 - \aj/2], AREA(Q) — \a j+1 /2] - \aj/2]. If P = P’，set 
SIB(Q) 4 - CHILD(P) and CHILD(P) 4 - Q; otherwise set SIB(Q) — SIB(P) and 
SIB(P) 4 — Q. Finally set P Q, jf jf + 2, and return to 13. 

111. [Move on.] Set j <— j 1. k <— k 1^ (P ， P’) next(P ， P’)，and go to 13. | 

To “absorb P into Q” means to do the following things: If (ROW(P), COL(P)) is less 
than (R0W(Q) ， C0L(Q)), set (ROW(Q) ， COL(Q)) — (ROW(P) ， COL(P))• Set AREA(Q) 4- 
AREA(P) + AREA(Q). If DORMANT(Q) = A, set DORMANT(Q ) 卜 DORMANT(P); otherwise if 
DORMANT (P) ^ A, swap SIB (DORMANT (P)) ^ SIB (DORMANT (Q)). Finally, set HEIR(P) ^ 
Q. (The HEIR links could be used on a second pass to identify the final component of 
each pixel. Notice that the PARENT links of dormant nodes are not kept up to date.) 

[A similar algorithm was given by R. K. Lutz in Comp. J. 23 (1980) ， 262—269.] 

180. Let F{x^ y) = x 2 — y 2 13 and Q(x ， y) = F(x — \^y— \) = x 2 — y 2 — x + y + 1?>. 
Apply Algorithm T to digitize the hyperbola from (^. rf) = (—6, 7) to . r/) = (0, a/13 )； 
hence x = —6 ，y = 7^ x f = 0^ y = 4. The resulting edges are (—6, 7) — (—5, 7) — 

(—5, 6 ) —— (—4, 6 ) —— (—4, 5) —— (—3,5) —— (—3,4) - - - (0,4). Then apply it again 

with g = 0, w f = 6 , ?/ = 7, x = 0, y = 4, a/ = 6 , ^ / = 7; the same edges are 

found (in reverse order), but with negated x coordinates. 

181. Subdivide at points rj) where F x d rf) = 0 or rf) = 0, namely at the real 

roots of {Q(-(brj + d)/(2a),rj + |) = 0, ^ = -(brj + d)/(2a) - or the real roots of 
{QK + |,-(k + e )/( 2c )) = 0, W = -(^ + e )/( 2c ) - |} 5 if they exist. 

182. By induction on \x ! — x\ -\- \y — y\. Consider, for example, the case x < x 
and y 1 > y. We know from (iii) that rj) lies in the box x — ^ < ^ < x + | and 
y — 2 — 2 ^ ail d from (ii) that the curve travels monotonically as it moves from 

rj) to ($’，?/)• It must therefore exit the box at the edge (x—^y—^) — (x— |, y-\- 1) 
or (x — I， y + I) —— (x + + |). The latter holds if and only if F(x — 臺 , y + |) < 0, 

because the curve can’t intersect that edge twice when x < x. And F{x — + ^) is 
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the value Q{x, y -\-1) that is tested in step T3，because of the initialization in step Tl. 
(We assume that the curve doesn’t go exactly through (x — 去 ， y + |)， by implicitly 
adding | to the function F behind the scenes.) 

183* Let Q 1 y) = —1 — Q(x ， y)• The key point is that Q(x ， y) < 0 if and only if 
Q 1 y) > 0. (Curiously the algorithm makes the same decisions ， backwards, although 
it probes the values of Q’ and Q in different places.) 

184. Find a positive integer h so that d = (r/ — r/)h and e = — C)h are integers and 

(i + e is even. Then carry out Algorithm T with x = |^ + 臺 」， y = [r/ + | 」， x’ = L4’ + I 」， 
y f = W + I」， an d Q(x ， y) = d(x - |) + e(y - |) + /, where 

/ = L(VC — — [<^> 0 and (r/f — ^'ri)h is an integer]. 

(The c d > term ensures that the opposite straight line, from « ， r/) back to (^. rf), will 
have precisely the same edges; see exercise 183.) Steps Tl and T6-T9 become much 
simpler than they were in the general case, because R = d and S = e are constant. 

(F. G. Stockton [CACM 6 (1963) ， 161 ， 450] and J. E. Bresenham [IBM Systems 
Journal 4 (1965). 25—30] gave similar algorithms, but with diagonal edges permitted.) 

185. (a) -B(e) = zo 2e(^i — zo) + 0(e 2 ); B{\ — e) = Z 2 — 2e(z2 — ^ 1 ) + 0(e 2 ). 

(b) Every point of S(zo. zi : Z 2 ) is a convex combination of zo, z\. and Z 2 . 

(c) Obviously true, since (1 — t) 2 + 2(1 — t)t + t 2 = 1. 

(d) The collinear condition follows from (b). Otherwise, by (c)，we need only 
consider the case zo = 0 and Z 2 — 2zi = 1, where zi = xi -iyi and y\ ^ 0. In that 
case all points lie on the parabola Ax = {y/yi) 2 + 4 ： yxi/yi. 

(e) Note that B(u0) = {l — u) 2 z^2u{l — u){[l — 0)zo J r0z\)-\-u 2 B{0) for 0 < < 1. 

[S. N. Bernshtein introduced B n (zo, zi,... , z n ;t) = (1 — t) n ~ k t k Zk in 

Soobshchenifh Khar’kovskoe matematicheskoe obshchestvo (2) 13 (1912), 1—2.] 

186. We can assume that zo = (xo, yo) : z\ = {x\. yi), and Z 2 = ( 尤 2 ，以 2 )， where the 
coordinates are (say) fixed-point numbers represented as 16-bit integers divided by 32. 

If zo : 2 ： i，and Z 2 are collinear, use the method of exercise 184 to draw a straight 
line from zo to Z 2 . (If z\ doesn’t lie between zo and Z 2 , the other edges will cancel out ， 
because edges are implicitly XORed by a filling algorithm.) This case occurs if and 
only if D = + x^ 2 + x 2 yo - ^iyo - x 2 yi - x 0 y 2 = 0. 

Otherwise the points (x. y) of S"(2 ： o, 之 1 ，之 2 ) satisfy F{x, y) = 0, where 

F ( x ^y) = (( x - x 0 )(y 2 - 2 yi + yo) - {y - yo)(x 2 - 2x x + x 0 )) 2 

— 4D((xi - x 0 )(y-y 0 ) - ( Vl - y 0 )(x - x 0 )) 

and D is defined above. We multiply by 32 4 to obtain integer coefficients; then negate 
this formula and subtract 1， if L) < 0, to satisfy condition (iv) of Algorithm T and the 
reverse-order condition. (See exercise 183.) 

The monotonicity condition (ii) holds if and only if (xi — ;ro )(^2 — xi) > 0 and 
(yi — yo)(y 2 — yi) > 0. If necessary, we can use the recurrence of exercise 185(e) 
to break *S(2 ： o, 之 1 ，之 2 ) into at most three monotonic subsquines; for example, setting 
6 = (xo — xi)/ (xo — 2xi + X 2 ) will achieve monotonicity in x. (A slight rounding error 
may occur during this fixed point arithmetic, but the recurrence can be performed in 
such a way that the subsquines are definitely monotonic.) 

Notes: When zo, 2 : 1 ， and Z 2 are near each other, a simpler and faster method based 
on exercise 185(e) with 0 = | is adequate for most practical purposes, if one doesn’t 
care about making the exactly correct choice between local edge sequences like a up- 
then-left” versus “left-then-up •” In the late 1980s，Sampo Kaasila chose to use squines 
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as the basic method of shape specification in the TrueType font format, because they 
can be digitized so rapidly. The META FONT system achieves greater flexibility with 
cubic Bezier splines [see D. E. Knuth ， METRFONT: The Program (Addison-Wesley, 
1986)]，but at the cost of extra processing time. A fairly fast “six - register algorithm” 
for the resulting cubic curves was ， however, developed subsequently by John Hobby 
[ACM Trans, on Graphics 9 (1990), 262-277]. Vaughan Pratt introduced conic splines, 
which are sort of midway between squines and Bezier cubics. in Computer Graphics 
9, 3 (July 1985), 151—159. Conic spline segments can be elliptical and hyperbolic as 
well as parabolic, hence they require fewer intermediate points and control points than 
squines; furthermore, they can be handled by Algorithm T. 


187. The following big-endian program assumes that n < 74880. 



L0C 

Data_Segment 


LD0 

k,Initk 


BITMAP 

LQC 

_*N/8 

OH 

SET 

s ， N/64 


base 

GREG 


1H 

SET 

a，h 

A trick (see below) 

GRAYMAP 

L0C 

@+M*N/64 


SET 

r,8 


GTAB 

BYTE 

255,252,249,246, 243 

2H 

LD0U 

t ， base,k 



BYTE 

240,236,233,230, 227 


MOR 

u ， cl，t 



BYTE 

224,221,217,214,211 


SUBU 

t, t ,u 

(Nypwise sums) 


BYTE 

208,204,201,198,194 


MOR 

u,c2 ,t 



BYTE 

191,188,184,181,178 


AND 

t ，rmil 



BYTE 

174,171 ， 167,164, 160 


ADDU 

t, t ,u 

(Nybblewise sums) 


BYTE 

157,153,150,146, 142 


MOR 

ii ， c3，t 



BYTE 

139,135,131 ， 128, 124 


AND 

t,t,mu2 



BYTE 

120 ， 116 ， 112 ， 108，104 


ADDU 

t, t ,u 

(Bytewise sums) 


BYTE 

100,96,92,88,84 


ADDU 

a,a,t 



BYTE 

79,75,70,66,61 


INCL 

k ， N/8 

Move to next row. 


BYTE 

56,52,46,41，36 


SUB 

r ， r，l 



BYTE 

30,24 ， 18 ， 10,0 


PBNZ 

r ， 2B 

Repeat 8 times. 

Initk 

QCTA 

BITMAP-GRAYMAP 

3H 

SRU 

t ， a，56 


corr 

GREG 

N-8 


LDBU 

t,gtab,t 


cl 

GREG 

#4000100004000100 


SLU 

a ， a，8 


c2 

GREG 

#2010000002010000 


STBU 

t ， z，0 


c3 

GREG 

#0804020100000000 


INCL 

Z，1 


mill 

GREG 

#3333333333333333 


PBN 

a ， 3B 

(The trick) 

mu2 

GREG 

#0f0f0f0f0f0f0f0f 


SUB 

k ， k，corr 


h 

GREG 

#8080808080808080 


SUB 

s ， s，l 


gtab 

GREG 

GTAB-#80 


PBNZ 

s，lB 

Loop on columns. 


L0C 

#100 


INCL 

k ， 7*N/8 

Loop on groups 

MakeGray 

LDA 

z,GRAYMAP 


PBN 

k ， 0B 

of 8 rows. | 


[Inspired by Neil Hunt’s DVIPAGE. the author used such gray maps extensively 
when preparing new editions of The Art of Computer Programming in 1992—1998.] 


188. If the rows of the bitmap are (Xo ， Xi ， … ， X 63 )，do the following operations for 
A ： = 0 ， 1 ， … ， 5: For all i such that 0 < i < 64 and i Sz2 k = 0, let j = i -\-2 k and either 
(a) set t •<— ㊉ (Xj 》 2 fc )) & Xi t Xi ㊉ t，•<— ■㊉ （ t 《 2 fc ); or (b) set 

t •<— Xi U •<— ((Xi <c 2 fc ) & ft6,k) \ Xj •<— ((X j 》 2 fc ) & I 亡 . 

[The basic idea is to transform 2 k X 2 k submatrices for increasing as in exercise 


5—12. Speedups are possible with MMIX，using M0R and MUX as in exercise 205, and using 
LDTU/STTU when k = 5. See L. J. Guibas and J. Stolfi. ACM Transactions on Graphics 
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1 (1982). 204—207. Incidentally. Theorem P and answer 54 show that Q(nlogn) 
operations on n-bit numbers are needed to transpose an nxn bit matrix. An application 
that needs frequent transpositions might therefore be better off using a redundant 
representation, maintaining its matrices in both normal and transposed form. 

189 . (a) We must have o^+i = f(cyj) ® c^j-i for j > 1， where ao = 0 … 0 and 
f(a) = ((a 《 l)&l...l)®a®(a 》 l). The elements of the bottom row satisfy 
the parity condition if and only if this rule makes a m +i entirely zero. 

(b) True. The parity condition on matrix entries is ㊉ ㊉ 

© a (i+i)j) where aij = 0 if i = 0 or z = m + 1 or j = 0 or j = n + 1. If two 

matrices (a^) and (bij) satisfy this condition, so does (cij) when Cij = aij ® bij. 

(c) The upper left submatrix consisting of all rows that precede the first all-zero 
row (if any) and all columns that precede the first all - zero column (if any) is perfect. 
And this submatrix determines the entire matrix, because the pattern on the other side 
of a row or column of zeros is the top/bottom or left / right reflection of its neighbor. 
For example, if a m / +1 is zero, then a m / +1+ j = a m / +1 —j for 1 < j < m — m!. 

(d) Starting with a given vector and using the rule in (a) will always lead to 

a row with a m +i = 0 ... 0. Proof: We must have (aj ， o^+i) = (afc, afc+i) for some 0 < 
j < k < 2 2n ，by the pigeonhole principle. If j > 0 we also have (aj-i, aj) = (afc—i ， o ^)， 
because c^j-i = ® a j+i = f( a k) ® <^k+i — Therefore the first repeated 

pair begins with a row ak of zeros. Furthermore we have ol{ = otk-i for 0 < i < A:; 
hence the first all-zero row a m +i occurs when m \s k — 1 oy k/2 — 1. 

Rows Qfi ，…， am will form a perfect pattern unless there is a column of Os. There 
are t > 0 such columns if and only if t + 1 is a divisor of n + 1 and a\ has the form 
... 0a (t even) or aOo^O … 0a R (t odd)，where \a\ 1 = (n + l)/(t + l). 

(e) This starting vector does not have the form forbidden in (d). 

190 . (a) The former is 叫，吻 ，… if and only if the latter is OaiOo^ ， Oo^Oo^， …. 

(b) Let the binary string aoai ... a^v-i correspond to the polynomial ao + a\x + 

••• + a^^ 1 x N ~ 1 ^ and let y = x^ 1 - \-l-\-x. Then ao = 0 ... 0 corresponds to Fo(y)] 
ai = 10... 0 corresponds to F\(y)\ and by induction aj corresponds to Fj (y ), mod 
x N + 1 and mod 2. For example, when TV = 6 we have ol^ = 110001 1 + x + x 5 

because x 一 1 mod (x 6 + 1 ) = x 5 , etc. 

(c) Again, induction on j. 

(d) The identity in the hint holds by induction on m，because it is clearly true 
when m = 1 and m = 2. Working mod 2， this identity yields the simple equations 


lower bounds 
redundant representation 
pigeonhole principle 
continuant polynomial 
Chebyshev polynomial 


F 2 k(y) = yF k (y) 2 ; F 2 k-i(y) = (F k ^i(y) + F k (y)) 2 . 


So we can go from the pair Pk = (Fk-i(y) mod Fk (y) mod (x Ar +l)) to the pair 

Pfc+i in 0(n) steps, and to the pair in 0(n 2 ) steps. We can therefore compute 
Fj(y) mod (x N + 1) after O(logj) iterations. Multiplying by / a (x) + / a (x _1 ) and 
reducing mod x N -\- 1 then allows us to read off the value of aj. 

Incidentally, is the special case K n {x, ... .x) of a continuant polyno¬ 
mial ； see Eq. 4.5.3—( 4 ). We have F n+ i(x) = ( n ^ k )x n ~ 2k = i~ n U n (ix/2)^ where 

U n is the classical Chebyshev polynomial defined by [/ n (cos 0) = sin((n + 1)0)/sin 0. 

191. (a) By exercise 190(c )， c(q) is the least j > 0 such that [x+x~ 1 )Fj[x~ 1 -\-l-\-x) = 0 
(modulo x 2q + 1 )， using polynomial arithmetic mod 2. Equivalently, it’s the smallest 
positive j for which Fj(y) is a multiple of (x 2q + 1) / (x 2 + 1) = (1 + x + • • • + x 9-1 ) 2 , 
when y = x _1 +l+^. 
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(b) Use the method of exercise 190(d) to evaluate ((x + x~ 1 )Fj(y)) mod (x 2q + 1) 
when j = M/p，for all prime divisors p of M• If the result is zero, set M M/p and 
repeat the process. If no such result is zero. c(q) = M. 

(c) We want to show that c(2 e ) is a divisor of 3 - 2 e_1 but not of 3 • 2 e — 2 or 
2 e ~ 1 . The latter holds because F 2 e-i(y) = y 2<3_1 一 1 is relatively prime to x 2G+1 +1. The 
former holds because 

o e — 1 i o c — 1 o g — 1 i o e o g — 1 i i q e 

F 3 . 2 e-i(y) = y 2 一 1 F 3 (y) 2 = y 2 - 1 (I + yf = y 2 -^x^+x) 2 , 


• • o c 1 2 e 

which is 三 0 modulo x +1 but not modulo x + 1. 

(d) F 2 ^-i (y) = X^ =1 y 2C - 2 k • Since y = x~ x (l+x+x 2 ) is relatively prime to a^ + 1 ， 
we have y— 1 三 ao -\-a\x + • • • + a q -ix q ~ 1 (modulo x q + 1) for some coefficients a^; hence 


y 


2 


k 


2 


k 


三 CLq CL\X + • • • + i X 


ao + ciix 


2 fc + 


+ a g 一 ix 


2 k + e (q-l) 


y 


2 


k + 


(modulo x q 1) for 0 < A: < e, and it follows that F 2 2 e_ 1 (y) is a multiple of x 2q + 1 . 

(e) In this case c(q) divides 4(2 2e — 1). Proof: Let x q 1 = /i(x)/2(x) • • • f r (x) 
where fi (x) = x + 1 ， f 2 (x) = x 2 x 1. and each fi(x) is irreducible mod 2. Since 
q is odd，these factors are distinct. Therefore, in the finite field of polynomials mod 
fj (x) for j > 3, we have y_ 2k = y - 2fc+e as in (d). Consequently F 2 2 e_ 1 (y) is a multiple 
of f 3 (x) ... f r (x) = (x q + l)/(x 3 + 1). So i 7 2(2 2 e -i) (y) = -^2 2e -i{y ) 2 a multiple of 
(x 2q + l)/(x 2 + 1) as desired. 

(f) If F c ( q ) [y) is a multiple of x 2g + l ， it’s easy to see that c(2q) = 2c(g). Otherwise 
F^ c {q)(y) is a multiple of F^{y) = (1 + y) 2 = x~ 2 {l + x) 4 ; hence F 6c ^ (y) is a multiple 
of x 4( i + 1 and c(2q) divides 6 c(g). The latter case can happen only when q is odd. 

Notes: Parity patterns are related to a popular puzzle called “Lights Out，” 
which was invented in the early 1980s by Dario Uri，also invented independently about 
the same time by Laszlo Meero and called XL.i::i:. [See David Singmaster^s Cubic 
Circular, issues 7 & 8 (Summer 1985) ， 39-42; Dieter Gebhardt, Cubism For Fun 69 
(March 2006) ， 23-25.] Klaus Sutner has pursued further aspects of this theory in 
Theoretical Computer Science 230 (2000) ， 49-73. 

192 * Let b^2i)(2j) = “ij ， 6(2 许 1)(2 彡） = 叫 ：/ ㊉ ^ 6(2i)(2j+i) = 叫 ：/ ㊉ 叫 ( 彡 +1)，and 

b(2i+i)(2j+i) = 0, for 0 < i < m and 0 < j < n, where we regard = 0 when z = 0 
or z = m + 1 or j = 0 or j = n + 1 . We don’t have (6(%” ， 6 (2i)2 , • • • ， 6( 叫 （2n+1) )= 
( 0 , 0 , •••，()）because ( 似， … ， a< n ) # ( 0 ，. ••，()）for 1 < z < m. And we don’t have 
(6(2i+i)i ， 6(2i+i)2, • • • ， 6(2i+i)(2n+i)) = (0, 0, • • • ， 0) because adjacent rows (a^_，• • • ， a《 n ) 
and (a(i+i)i, … ， a(i+i)n) always differ for 0 < z < m when m is odd. 

193. Set 沐 (1 《 （ n—i)) I (1 《 (z—1)) for 1 < z < m. where m = [n/2]. Also set 

(/3i & an) + (ft & o^i 2 ) H - h (/3 m & c^ m )，where aij is the jth row of the parity 

pattern that begins with 你 ； vector 7 ^ records the diagonal elements of such a matrix. 
Then set r 0 and apply subroutine N of answer 194 for i 1 ， 2,… ， m. The resulting 
vectors , 0 r are a basis for all n X n parity patterns with 8 -fold symmetry. 

To test if any such pattern is perfect, let the pattern starting with 0 i first be zero 
in row C{. If any Ci = n + 1, the answer is yes. If lcm(ci,..., c r ) < n, the answer 
is no. If neither of these conditions decides the matter, we can resort to brute-force 
examination of 2 r — 1 nonzero linear combinations of the 6 vectors. 

For example, when n = 9 we find 71 = 111101111, 72 = 73 = 010101010 ， 74 = 
000000000, 75 = 001010100; then r = 0,0 1 = 011000110, 0 2 = 000101000, c x = c 2 = 5. 

So there is no perfect solution. 
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In the author’s experiments for n < 3000， “brute force” was needed only when 
n = 1709. Then r = 21 and the values of Ci were all equal to 171 or 855 except that 


C 21 = 342. The solution 6\ ® 0 21 was found immediately. 

The answers for 1 < n < 383 are 4, 5, 11, 16, 23, 29, 30, 32, 47, 59, 62, 64, 65, 
84, 95, 101, 119, 125, 126, 128, 131, 154, 164, 170, 185, 191, 203, 204, 239, 251, 254, 
256, 257, 263, 314, 329, 340, 341 ， 371, 383. 


[A fractal similar to Fig. 20， called the “mikado pattern,” appears in a paper by 
H. Eriksson. K. Eriksson, and J. Sjostrand, Advances in Applied Math. 27 (2001) ， 365. 
See also S. Wolfram. A New Kind of Science (2002), rule 150R on page 439.] 

194. Set pi i— 1 《 (m — i) and 7 ^ a，i for 1 < z < m; also set r 0. Then perform 
the following subroutine for z = 1 , 2 . … ， m: 

Nl. [Extract low bit.] Set x 7 ^ & — 7 ^. If x = 0, go to N4. 

N2. [Find j.] Find the smallest j > 1 such that 7 j & x 7 ^： 0 and 7 ^ & (x — 1) = 0. 

N3. [Dependent?] If j < i. set ㊉ 7 ) ， 你 '㊉ 爲 ， and return to Nl. 

(These operations preserve the matrix equation C = BA.) Otherwise termi¬ 
nate the subroutine (because 7 ^ is linearly independent from 71 ， … ， 7 i-i). 

N4. [Record a solution.] Set r r + 1 and G r l pi. | 


At the conclusion, the m — r nonzero vectors 7 ^ are a basis for the vector space of all 
linear combinations of a\ , •… ， a m ; they’re characterized by their low bits. 

195. (a) # 0a; # cea3; # e7ae97; # f09d8581. 

(b) If Xx = \x\ the result is clear because l = V. Otherwise we have a 1 < a[. 

(c) Set j k: while ㊉ # 80 < # 40, set j j — 1. Then a(x^) begins with aj. 

196. (a) # 000a; # 03a3; # 7b97; # d834dd41. 

(b) Lexicographic order is not preserved when, say, x = # ffff and x = #10000. 

(c) To answer this question properly one needs to know that the 2048 integers 

in the range # d800 ^ x 〈 #e 000 are not legal codepoints of UCS: they are called 
surrogates. With this understanding. begins at ak if a ㊉ # dc00 > # 0400, 

otherwise it begins at 

197. a = # e5000000 ， b = 3. c = # fe. (We could let 6 = 0 ， but then a would be huge. 
This trick was suggested by P. Raynaud-Richard in 1997.) 

198. We want a\ > # cl; 2 s a\ + < # f490; and either (ai & —c^i) + ai < #100 or 

> # 17f. These conditions hold if and only if 


( # cl — ai) Sz (2 8 ai + 0^2 — # f490) & (((ai &—ai) +ai — # 100) | ( # 17f — ai — 0 ^ 2 )) < 0. 


Markus Kuhn suggests adding the further clause 4 Sz ( # 20 — ((2 8 ai + Oi 2 ) ㊉ # eda ))’， to 
ensure that ol\ol 2 doesn’t begin the encoding of a surrogate. 

199 . If $0 = (x 7 ... xiXo )256 then $3 = ( 阶，❽， 奶 ) = (X 7 &X 4 ) | (^7 &X 2 ) | &X 2 ). 

200 . M0R x ， c ， x, where c = # f Of Of Of OOf Of Of Of . 

201 . MOR x，x ， c ， where c = # c0c030300c0c0303; then M0R x ,mone 5 x. (See answer 206.) 

202 . a = #0008000400020001, b = # 0fOfOfOfOfOfOfOf, c = #0606060606060606, 
d = #0000002700000000, e = # 2 a 2 a 2 a 2 a 2 a 2 a 2 a 2 a. (The ASCII code for 0 is 6 + # 2 a; 
the ASCII code for a is 6 + # 2 a + 10 + # 27.) 

203 . p = #8008400420021001 ， g = # 8020080240100401 (the transpose of p), r = 
#4080102004080102 (a symmetric matrix). and m = # aa55aa55aa55aa55. 
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204 . Just change p to #0880044002200110. (Incidentally, these shuffles can also be 
defined as permutations on z = (之 63 ••• > 212 : 0)2 in another way: The outshuffle maps 
Zj ㈠' z ( 2 j) mod 63 while the inshuffle 

205 . (Solution by H. S. Warren, Jr.) The text’s 7-swap, 14-swap, 28-swap method can 
be implemented with only 12 instructions: 


maps Zj f—> 2 ： (2j + l) mod 65.) 


M0R t ,x ， cl; M0R t ， cl ， t; PUT rM ， ml; MUX y ， x ， t; 

MQR t ， y,c2; M0R t ， c2,t; PUT rM ， m2; MUX y ， y ， t; 

M0Rt ， y ， c3; M0Rt ， c3 ， t; PUT rM,m3; MUX y ， y ， t; 

here cl = #4080102004080102, c2 = #2010804002010804, c3 = #0804020180402010, 
ml = # aa55aa55aa55aa55 ， m2 = # cccc3333cccc3333 ， m3 = # f Of Of Of OOf Of Of Of . 

206. Four instructions suffice: MX0R y ， p ， x; MXQR x ， mone ， x; MX0R x ， x ， q; X0R x ， x ， y; 
here p = # 80c0e0f Of 8f cf eff ， mone = —1，and q = p. 

207. SLU x ， one ， x; MQR x ， b ， x; AND x ， x ， a; M0R x ， x，#ff ; here register one = 1. 

208* In general, element ij of the Boolean matrix product AXB is \J{xki | dik A bij}. 
For this problem we choose = [i C k] and bij = [l ^ j]; the answer is c M0R t ， f ， a; 
M0R t ， b ， t，where a = # 80c0a0f088ccaaff and b = # ff5533110f050301 = a T . 

A 

(Notice that this trick gives a simple test [/= / ] for monotonicity. Furthermore, 
the 64 - bit result (^63 - • • 亡 i 亡 o )2 gives the coefficients of the multilinear representation 

. . . ， X 6 ) = (t 6 3 + 亡 62 尤 6 H -+ t 1 X 1 X 2 X3X 4 X5 + ) mod 2 ， 


if we substitute MXQR for MQR, by the result of exercise 7.1.1—11.) 

209. If • denotes MX OR as in ( 183 ) and b = ( 冷 7 • • • / 3 i/ 3 o )256 has bytes 氏 ， we can evaluate 


c = (a-Bo ) ㊉ ((a 《 8 ) • (Bi -\-Bq )) ㊉ ((a 《 16) • (B 2 )) ㊉…•㊉ ((a 《 56)• (B^-\-B ^)), 

where Bf = (q/3j) & m, Bf = < 8 ) +/3j) & m, q = #0080402010080402, and 

m = # 7f3f If Of 07030100. (Here qf3j denotes ordinary multiplication of integers.) 


210. In this big-endian computation, register nn holds — n，and register data points 
to the octabyte following the given bytes a n _i ... aiao in memory (with a n -i first). 
The constants aa = # 8381808080402010 and bb = # 339bcf 6530180c06 correspond to 
matrices A and B, found by computing the remainders x k mod p(x) for 72 < k < 80. 


SET c，0 
ADD nn,nn ,8 
LDQU t,data,nn 
BZ nn ， 2F 
1H MX0R u ， aa，t 
MX0R v ， bb，t 
ADD nn,nn ,8 


c — 0. 
n 4— n — 8 . 
t next octa. 
Done if n = 0. 
U i — t • A. 
v i— t • B. 
n 4— n — 8 . 


LD0U 

t,data,nn 

XQR 


SLU 

c,v,56 

SRU 

v,v ,8 

XQR 

U ， U，V 

XQR 

t ， t，u 

PBN 

nn, IB 


t next octa. 

u u ® c. 

c v ^56. 
v ^ v ^ 8. 

U ^ U ^ V. 

t <— t ㊉ u. 
Repeat if n > 0. 


A similar method finishes the job，with no auxiliary table needed: 


I 


2H SET nn ，8 
3H AND xf 000 
MXOR u ， aaa，x 
MXQR v ， bbb，x 
SLU t ， t ，8 


n i — 8 . 

x high byte. 
u x • A!. 

v x • B f • 


SRU v ， v ， 8 
XQR t ， t，v 
SUB nn ， nn ， 1 
PBP nn ， 3B 
XQR t ， t，c 


v v 8. 

t <— t ^ V. 

n n — 1 . 
Repeat if n > 0. 

亡乂 一亡 ㊉ c. 


XQR t ， t，u t i— t ^ u. 


SRU crc ， t ， 48 Return t 48. | 


Here aaa = # 8381808080808080, bbb = # 0383c363331b0f 05 5 and ff 000 = # ff00...00. 


The Books of the Big-Endians have been long forbidden. 


LEMUEL GULLIVER, Travels Into Several Remote Nations of the World (1726) 


Warren 

swap 

MUX 

Boolean matrix product 

multilinear representation 

MXOR 

big-endian 

GULLIVER 

SWIFT 
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211 . By considering the irreducible factors of the characteristic polynomial of X ， 
we must have X n = / where n = 2 3 • 3 2 • 5 • 7 • 17 • 31 • 127 = 168661080. Neill 
Clift has shown that /(n — 1) == 33 and found the following sequence of 33 MXOR 
instructions to compute Y = X^ 1 = X n_1 : MXOR t ,x,x; MXOR $1,x; MXOR $2,t ,$1; 
MXQR $3, $2, $2; MXOR t,$3,$3; 5 6 ; MXOR 5 3 ; MXOR $l,t,$l; MXOR 

aS 13 ; MXQR MXQR y ， t ， x; here S stands for ‘MXOR To test if X is 

nonsingular，do MXOR t ， y，x and compare t to the identity matrix #8040201008040201 • 


Clift 

MXOR 

identity matrix 
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When an index entry refers to a page containing a relevant exercise, see also the answer to 
that exercise for further information. An answer page is not indexed here unless it refers to a 
topic not included in the statement of the exercise. 


0—1 matrices, 67—70, see also Bitmaps, 
multiplication of, 50-51, 56. 
transposing, 15, 56, 67, 69, 80. 
triangularizing, 68. 

0—1 principle, 54. 

—1 (the constant (… 111)2) ， 3 ， 8 ， 9 ， 

50, 71 ， 76, 106, 107. 

2-adic chains, 23-27, 37, 61, 91, 96. 

2-adic fractions, 9, 75. 

2-adic integers, 2, 16, 21, 53, 55. 

as a metric space, 74. 

2-bit encoding for 3-state data, 28—31 ， 63. 
2-cube equivalence, 29-30. 

2- dimensional data allocation, 16. 

2ADDU (times 2 and add unsigned), 79 ， 84. 

3- valued logic, 31, 63. 

4- neighbors, see Rook-neighbors, 40. 
4ADDU (times 4 and add unsigned), 79. 
8-neighbors, see King-neighbors, 40. 

8ADDU (times 8 and add unsigned), 79. 
16ADDU (times 16 and add unsigned) ， 79. 
oo (infinity) ， 8 ， 55. 

J-maps ， 84. 

(^-shifts, 16. 

6-swaps ， 13—15, 50, 55—56, 107. 

Xx (Llgr 」），see Binary logarithm. 

I_i (average memory access time), 117. 

/ifc and ~ fc ， see Magic masks. 
vx^ see Sideways addition. 
px, see Ruler function. 
v (instruction cycle time), 117. 

Absorption laws, 3. 

Abstract RISC (reduced-instruction-set 
computer) model, 26. 

Ackland，Bryan David, 44. 

Acyclic digraph, 33. 

Addition, 3, 19. 
bytewise, 19, 87. 
scattered, 18 ， 57. 
sideways, 2 ， 11-12, 55, 62, 94. 
Adjacency matrix of a graph, 28 ， 62. 
Adventure game ， 85. 

Agrawal, Dharma Prakash (^nt 5T^TT5T 
STIT^RT), 71. 

Albers, Susanne ， 88. 

Allocation of memory, 22 ， 54, 59. 
Allouche, Jean-Paul, 78. 

Alpha channels, 59. 

Alphabetic data, 20, 59. 

Analysis of algorithms, 55, 85. 

Ancestors in a forest, 33. 
nearest common, 33—35, 64. 


AND (bitwise conjunction), 2-3. 

Animating functions, 53, 56. 

Ariyoshi. Hiromu ( 有吉也 ) ， 92. 

Arndt, Jorg Uwe, 76, 84. 

ASCII: American Standard Code for 

Information Interchange, iv. 59. 69 ， 117. 
Associative laws, 3, 72. 

Asterisk codes for subcubes, 18 ， 63. 
Averages, bytewise, 19 ， 59. 

Background of an image, 42. 

Balanced branching functions, 53. 

Balanced ternary notation, 63. 79. 

Banyan networks, 81. 

Basic RAM (random-access machine) 
model, 26 ， 62 ， 91. 

Baumgart, Bruce Guenther, 12. 

Bays. John Carter, 77. 

BDIF (byte difference), 20. 86-87. 

Benes, Vaclav Edvard. 13. 

Bentley. Jon Louis, 95. 

Berlekamp, Elwyn Ralph, 21 ， 73 ， 98. 
Bernshtem, Sergei Natanovich (BepHinTeHH. 

Ceprefi HaTaHOBun). 102. 

BESM-6 (BECM-6) computer, 83. 

Beyer, Wendell Terry, 42. 

Bezier, Pierre Etienne ， splines. 48 ， 

66-67, 103. ' 

Big-endian convention, 6-8. 12. 20. 

77, 103, 107. 

Binary basis, 71. 

Binary logarithm (Xx = |lgx 10-11, 

21, 25, 33, 55, 60-61, 64. 

Binary recurrence relations. 8, 10, 55. 

Binary search trees, 64. 79. 

Binary tree structures. 32. 

Binary valuation, see Ruler function. 
Binary-coded decimal notation. 60. 

Bipartite graphs, 14-15. 97. 

Bit boards, 32, 63. 

Bit codes for subcubes. 18. 63. 

/ / 

Bit permutations, 13-17, 25, 50. 

Bitmaps, 39-48. 64-68. 
cleaning. 65. 
drawing on, 48. 

filling contours in. 44-48. 66-67. 
rotation and transposition of, 67. 

Bitwise manipulations, 1—108. 

Black pixels, 4, 40. 67. 

Bolyai, Janos ， 36. 

Bookworm problem, 54. 

Boolean matrices ， 50. 69. see also Bitmaps, 
multiplication of, 50 ， 56 ， 107. 
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Borkowski, Ludwik Stefan, 31. 

Borrows, 86-87. 

Boundary curves. digitized, 44—48. 

Bouton, Charles Leonard. 71. 

Branch instructions. 10, 48-49, 61. 
Branching functions, 53, 56. 

Branchless computation, 23-26, 48-49. 

69, 70. 

Bray more, Caroline, 1. 

Breadth-first search, 91, 97. 

Brent, Richard Peirce. 83. 

Bresenham, Jack Elton, 102. 

Breuer, Melvin Allen. 94. 

Broadword chains ， 23-27 ， 60-65. 96. 
strong, 61. 

Broadword computations ， 21-27, 60-65. 
Brodal, Gerth St0lting. 22. 

Brodnik, Andrej ， 27. 

Bron, Coenraad ， 92. 

Brooker, Ralph Anthony, 2. 

Brown, David Trent, 51. 

Bruijn, Nicolaas Govert de, cycles, 10. 
Biichi, Julius Richard, 75. 

Butterfly networks, 56. 

Byte: An 8-bit quantity, 69. 

Byte permutations, 50. 

Bytes, parallel operations on, see Multibyte 
processing. 

Cache memory. 5, 35, 49. 77. 91. 

Cahn, Leonard, 40. 

Cancellation law ， 72. 

Cantor, Georg Ferdinand Ludwig 
Philipp, 85. 

Cardinality of a set, 11. 

Carries, 18- 19, 25, 86-87. 

Cartesian coordinates. 44. 

j 

Cartesian trees, 79 ， 95. 

Cellular automata, 40-43, 65-66. 

Chebyshev (= Tschebyscheff), Pafnutii 
Lvovich (HeGbimeB, nacjDHyTHH 
JIbbobhh), polynomials, 104. 

Cheshire cat, 42— 43 ， 65 ， 66 ， 100. 
Chessboards, 32, 63. 

Chung, Kin-Man ( 鍾建民 ) ， 17, 58. 

Circles, digitized, 44, 47. 

Circular lists, 62. 100. 

Cleaning images, 65. 98. 

Clift, Neill Michael. 108. 

Cliques ， maximal, 62—63. 

Closed bitmaps, 65. 

Collation of bits. 2. 

j 

Colman, George, the younger, 1. 
Combinations, 75-76. 

Commutative laws, 3, 71. 

Comparison of bytes, 21, 60. 
Complementation. 3, 52, 92. 

Complete binary trees, 33. 74. 

infinite. 53. 

/ 


Composition of permutations, 53. 56—57. 
Compression of scattered bits, 16 ， 57, 83. 
Conditional-set instructions, 9-10, 48. 
Conic sections, digitizing, 44-48, 66-67. 
Conic splines, 103. 

Conjunction, in 3-valued logic, 31. 
Connectivity structure of an image. 

41-43, 66. 

Consensus of sub cubes, 63. 

Continuant polynomials, 104. 

Control points, 48. 

Convex optimization, 85. 

Conway, John Horton, 40, 73, 74. 98. 

field. 52. 

/ 

CRC (cyclic redundancy check), 51 ， 70. 
Crossbar modules, 14. 58. 

CSNZ (conditional set if nonzero). 10. 

48-49, 88. ’ 

CS0D (conditional set if odd). 79. 

CSZ (conditional set if zero), 9. 77, 78. 
Custering, 39, 44. 64-65. 

Cycles in a graph, 15. 

Cyclic redundancy checking, 51. 70. 

Cyclic shifts, 17, 56, 86. 

Cylinder, hyperbolic, 39 ， 97. 

Dalios, Jozsef, 9. 

Dates, packed ， 4 ， 60. 
de Bruijn, Nicolaas Govert, cycles, 10. 
Depth of a Boolean function. 13. 
Descartes, Rene, coordinates, 44. 

Dietz, Henry Gordon. 19. 85. 

Digitization of contours, 44-48 ， 66-67. 
Dijkstra, Edsger Wybe. 85. 

Dirichlet，Johann Peter Gustav Lejeune, 
generating function, 78. 

Discrete logarithm, see Binary logarithm. 
Disjointness testing, 58. 

Disjunction, in 3-valued logic, 31. 

Distance between 2-adic integers. 74. 
Distinct bytes, testing for. 59. 

Distribution networks, see Mapping 
networks. 

Distributive laws, 3, 72. 

Divide and conquer paradigm, 12. 16. 
Division, 54. 
avoiding, 4 ， 54. 
by 10, 24. 

by powers of 2. 3-4. 
in Conway’s field. 52. 
of 2-bit numbers, 59. 

Dominating sets ， minimum, 98. 
Don’t-cares ， 18 ， 29-30. 81, 94. 

Dot-minus operation (x —y), v. 20 ， 

24, 61 ， 82, 96. 

Double order for traversing trees. 100-101. 
Drawing on a bitmap. 48. 

Duality between 0 and 1, 99. 

Duguid, Andrew Melville, 13. 

DVIPAGE program, 103. 
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Edges between pixels. 44-48. 66-67. 

EDS AC computer. 2, 11. 

Eight queens problem. 92. 

Ellipses ， 44—47, 103 .' 

Encoding of ternary data, 28-31, 63. 

Eofill (even/odd filling), 47. 

Equality of bytes. 20. 59. 60. 

Equivalence, in 3-valued logic, 63. 
Eratosthenes of Cyrene (^EpaToaGsvri? 

6 Kupr]vaiO(；); sieve (xoaxtvov). 5, 54. 
Eriksson. Henrik, 106. 

Eriksson. Kimmo, 106. 

Escher, George A.. 37. 

Escher, Maurits Cornells, 37. 

Euclid (EuxXsi^ri^), 36. 

Extracting bits, 2, 4. 8. 

and compressing them, 16 ， 57. 83. 
the least significant only (2 px )， 

8—10, 18, 54. 

the most significant only (2 Ax ), 11 ， 

60— 62, 89. 

Fast Fourier transforms, 56. 

Ferranti Mercury computer. 2. 

Fibonacci, Leonardo, of Pisa (= Leonardo 
filio Bonacii Pisano), numbers, 36. 
Fibonacci number system. 36 ， 64; see also 
NegaFibonacci number system, 
odd, 96. 

Fibonacci polynomials. 67-68. 

Fields, algebraic. 50, 52. 105. 

Fields of data, see Packing of data. 

Filling a contour in a bitmap. 44-48, 66-67. 
Fingerprints. 40. 

Finite fields. 50, 105. 

Finite state automata. 89. 

/ 

Fischer, Johannes Christian. 95. 

Fisher, Randall James. 19, 85. 

Fixed point arithmetic, 86. 102. 

Flag: A 1-bit indicator, 20. 59, 60. 

Floating point arithmetic. 10 ， 78. 

Floyd, Robert W, 58. 

Footprints, 87. 93. 

Fractals, 68. 78. 

Fractional precision, 4, 69. 

Fragmented fields ， 18, 58. 

Fredman, Michael Lawrence. 22. 60. 

Freed, Edwin Earl, 55. 

Frey, Peter William, 94. 

Fuchs, David Raymond. 99. 

Full adders. 98. 

/ 

for balanced ternary numbers. 63. 

Gabow. Harold Neil, 95. 

/ / 

Games, 40. 52. 85. 

Gaps, between prime numbers. 77. 
between Ulam numbers, 93. 
in a scattered accumulator, 85. 

Garbage collection. 27. 


Gardner, Martin. 40. 98. 

Gathering bits. 83. 

Gaul3 (= Gauss), Johann Friderich Carl 
(=Carl Friedrich), 36. 

Gebhardt, Dieter, 105. 

Generating functions. 55. 57. 

Dirichlet, 78. 

Gill, Stanley, 11. 

Gillies, Donald Bruce, 11. 

Gladwin. Harmon Timothy, 8. 

Gosper, Ralph William, Jr., v, 56, 70. 

hack, 4. 54. 

Graphs, 14-15. 

algorithms on. 27—28 ， 62-63. 

Gray, Frank, binary code. 73. 89. 

Gray levels in image data, 59, 67. 
Greedy-footprint heuristic, 87 ， 93. 

GREG (global register definition) ， 9 ， 12. 

Grid structure. 36. 98. 

/ / 

Group of functions, 53. 

Groupoids, multiplication tables for, 31 ， 63. 
Grundy, Patrick Michael, 71. 

Guibas, Leonidas John (rxifXTta^. Ascovl^a^ 
^Icoavvou), v. 103. 

Gulliver. Lemuel, 107. 

Guo. Zicheng Charles ( 郭自成 ) ， 41 ， 65. 

Guy, Richard Kenneth, 73, 98. 

Hacks, 1-108. 

Hagerup, Torben, 88. 

HAKMEM ， 26, 71 ， 75. 

Half adders, 98. 

for balanced ternary numbers, 63. 

Hall. Richard Wesley. Jr.. 41. 65. 

Hamburg. Michael Alexander. 75. 

Hardy. Godfrey Harold. 75. 

Harel，Dov pNin : n) ， 33. 

Heaps, 32. 

sideways, 32. 63-64. 

Heckel, Paul Charles, 82. 

Herrmann. Francine, 36. 

Heun. Volker. 95. 

Hexadecimal constants, v. 

Hexadecimal digits, 69. 

Hobby, John Douglas. 48, 103. 

Holes in images. 42-43. 

Hollis, Jeffrey John, 62. 

Hudson, Richard Howard, 77. 

Hunt, Neil. 103. 

Hyperbolas, 44, 66, 75, 103. 

Hyperbolic plane geometry. 35—39. 

47, 64, 97. ’ ’ 

Ide, Mikio (井手幹生 )， 犯. 

Identities for bitwise operations, 3. 52. 

53, 55, 75, 77, 86. 

Identity matrix, 108. 

ILLIAC I computer. 11. 

Implication, in 3-valued logic, 31. 

Implicit data structures. 32—39. 63-64. 
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Independent sets, maximal, 63. 

Infinite binary trees. 53. 

Infinite exclusive or. 74. 

j 

Infinite-precision numbers. 2. 4, 52. 

Inorder of nodes, 33. 

Inshuffles, 69, 80. 

Inside of a curve, 44. 

Interchanging two bits, 55. 

Interchanging selected bits. 71. 

Interleaving bits. 16. 59; see also Perfect 
shuffles, Zipper function. 

Internet, ii, iii. 

Inverse of a binary matrix. 70. 

Inverse of a permutation, 50. 

Isometries, 74. 

Jardine, Nicholas, 92. 

Johnson, David Stifler, 92. 

Jordan, Marie Ennemond Camille, 
curve theorem, 44. 

Kaas, Robert, 32. 

Kaasila, Sampo Juhani. 102. 

Katajainen, Jyrki Juhani, 35. 

Kerbosch, Joep A. G. M., 92. 

King-neighbors, 40. 

Kingwise connected components. 

41—43, 65-66. 

Kirsch, Russell Andrew, 40 ， 65. 

Knight moves, 63. 

Knuth, Donald Ervin ( 高德纳）， i ， v ， 22 ， 

77, 78, 93, 99, 103, 106. 

Kuhn, Markus Gunther. 106. 

Lakhtakia. Akhlesh ), 75. 

Lamport, Leslie B. ， 19, 20 ， 59. 

Lander. Leon Joseph, 77. 

Large megabytes, 77. 

Largest element of a set, 11. 

Larvala, Samuli Kristian. 83. 

Latin-1 supplement to ASCII. 85. 

Lauter. Martin, 10. 

Lawrie, Duncan Hamish, 81. 

Le Corre, J. ， 13. 

Leap year, 88. 

Least common ancestors, see Nearest 
common ancestors. 

Least significant 1 bit (2 px ). 8—9 ， 54. 

Lee. Ruby Bei-Loh ( 李佩露 ) ， 83. 
Left-to-right minimum, 95. 

Leftmost bits, 10-11, 22. 55. 

Lehmer. Derrick Henry, 4. 

Leiserson. Charles Eric, 10. 55. 

Lenfant, Jacques, 80. 

Lenstra. Hendrik Willem ， Jr., 52. 73, 74. 
Levialdi Ghiron, Stefano. 42-43. 
Lexicographic order, 18. 68. 
lg, see Binary logarithm. 

Life, 40. 65. 


Lights Out puzzle. 105. 

Linked allocation. 91. 

j 

Little-endian convention. 6-8 ， 12, 20. 

28, 76, 77. ^ 

Littlewood, John Edensor, 75. 

Lobachevsky, Nikolai Ivanovich 
(JloGaHeBCKm, Hmcojiafi 

HBaHOBHHTb), 36. 

Loukakis，Emmanuel (Aouxocxr](；. 

MavcoXrjc). 92. 

Lower bounds, 23-27, 61-62, 104. 

Lowercase letters ， 59. 

Lowest common ancestor, see Nearest 
common ancestor. 

Loyd, Samuel, 77. 

Lukasiewicz, Jan, 31. 63. 

Lutz, Rudiger Karl (= Rudi), 101. 

Lynch, William Charles, 11. 

Magic masks (jik and 9—12 ， 16 ， 

22, 37, 54, 71 ， 75, 76’ ， 78—80, 82, 

84, 88, 96, 103. 

Majority function, 27 ， see also Median 
function. 

Malgouyres ， Remy, 99. 

Manchester Mark I computer, 2. 

Mann, William Fredrick, 98. 

Mapping modules, 58 ， 81. 

Mapping networks, 58 ， 81. 

Mapping three items into two-bit codes. 
28-31, 63. 

Mappings of bits, 17. 58. 81. 

Margenstern, Maurice, 36. 

Mark II computer (Manchester/Ferranti). 2. 
Martin，Monroe Harnish, 10. 

Mask: A bit pattern with Is in key 
positions, 9 ， 16-18, 20. 49. 50. 69. 
Masking: ANDing with a mask. 31. 

Matrices of 0s and Is, 67—70. see also 
Bitmaps. 

multiplication of. 50—51, 56. 
transposing, 15, 56. 67, 69, 80. 
triangularizing, 68. 

Matrix multiplication, 50—51, 56. 

Matrix transposition, 15, 56, 67, 69, 80. 
max (maximum) function ， 2 ， 31. 60. 
Maximal cliques, 62. 

Maximal independent sets, 92. 

Maximal proper subsets, 58. 

Maybe, 31. 

McCranie, Judson Shasta ， 93. 

Median function, v, 21. 86. 87. 

Meero, Laszlo, 105. 

Mems: Memory accesses. 

Merge sorting, 49. 

METRFONT, 103. 

mex (minimal excludant) function. 52. 
Mikado pattern. 106. 

Miller, Jeffrey Charles Percy. 11. 

Miltersen, Peter Bro. 27. 
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min (minimum) function. 2. 31. 60. 

Minimal excludant, 52. 

Minimum element in sub array, 64. 

Minsky, Marvin Lee, 66. 

Mixed-radix representation, 60. 

MMIX ， ii ， iv ， 5, 7—10, 12, 19, 20, 28, 48—51 ， 

54, 55, 57, 59, 60, 62, 67, 69, 70 ， 

73, 79, 84, 86, 87. ’ ’ 

mod (remainder) function, 4. 

Modal logic, 31, 63. 
mone ， 76, see —1. 

Monotone Boolean functions, 70. 

Monotonic portions of curves. 45-47. 66. 
Monus operation (x — y). v, 20. 24 ， 

61 ， 82, 96. ’ ' 

Moody, John Kenneth Montague 
(=Ken), 62. 

M0R (multiple or), 12, 19, 50— 51 ， 56 ， 

69-70, 86, 94, 103. ’ 

Morton, Guy Macdonald, 85. 

Most significant 1 bit (2 Ax ) ， 2 ， 11 ， 60-62, 89. 
MP3 (MPEG-1 Audio Layer III), 51. 

Muller，David Eugene, 11. 

Multibyte encoding, 68-69. 

Multibyte processing, 19—23 ， 59-61. 
addition, 19, 87. 
comparison, 20-21. 
max and min, 60, 88. 
potpourri, 87. 
subtraction, 59 ， 87. 

Multilinear representation of a Boolean 
function, 107. 

Multiple-precision arithmetic, 6. 
Multiplication, 4, 10— 11 ， 22 ， 61 ， 78. 
avoiding ， 21 ， 22, 59, 78. 
by powers of 2, 3. 78. 
in Conway’s field, 52. 
in groupoids, 31, 63. 
lower bound for. 22. 26. 62. 
of 0-1 matrices, 56; see also M0R and MX0R. 
of polynomials mod 2, 70. 
of signed bits, 29-30. 

Munro, James Ian, 27. 

MUX (multiplex) ， 50 ， 83 ， 86 ， 103. 107. 

MX0R (multiple xor), 50— 51 ， 56 ， 69—70, 

70, 73, 107, 108. 

Mycroft, Alan, 20. 

Navigation piles, 35, 64. 

Nearest common ancestors, 33—35, 64. 
Necessity, in 3-valued logic, 63. 

Negabinary number system, 52. 

Negadecimal number system, 37. 

NegaFibonacci number system. 36-39. 64. 
Negation, 3, 52, 63. 

Nested parentheses, 54. 

Newline symbol. 20. 

Nicely, Thomas Ray. 77. 


Nim ， 2, 52. 

addition, 2, 52. 
division, 52. 
multiplication, 52, 73. 
second-order, 52. 

Noisy data, 65. 

Non - Euclidean geometry ， 35-36. 97* 

NOT (bitwise complementation), 2. 

Notational conventions, v, 81. 

(xyz) (median of three), v. 
ix—>* v (transitive closure), 27. 
x or (bitwise complement). 3. 

: r ㊉ （suffix parity), 55. 
x Sz y (bitwise AND), 3. 
x I y (bitwise OR). 3. 
x ㊉ y (bitwise XOR), 3. 
x 《 y (bitwise left shift), 3. 
x 》 y (bitwise right shift). 3. 
x \ y (zipper function), see Zip. 
x — y (max(x—y, 0)). v. 20. 24. 61. 82, 96. 
{xl y: z) = xy + xz (mux), 60. 62. 
z -I- x (sheep-and-goats), 17. 57. 

NP-hard problems. 57. 

Null spaces. 68. 

NX0R (not xor), 79. 

Nybble: A 4-bit quantity, 12. 69. 

Nyp: A 2-bit quantity, 12. 69. 

Objects in images, 42. 

Octabyte or octa: A 64-bit quantity, 69. 

Odd Fibonacci number system, 96. 

Ofman. Yuri Petrovich (0<J)MaH. lOpHH 
IleTpOBHH) ， 84. 

Omega network for routing, 56—57. 

One-to-many mapping. 17. 30. 

Ones counting, see Sideways addition. 

Online algorithms, 42—43. 66. 

Open bitmaps. 65. 

Optical character recognition, 40. 65. 

OR (bitwise disjunction), 2—3. 

Ordinal numbers, 73. 

Oriented forests and trees, 33 ， 42. 

Oriented paths, 27. 

Outshuffles, 56, 69. 

Outside of a curve, 44. 

Overflowing memory, 92. 

Packed data, operating on. 4, 19-21, 

31, 59—60, 63, 69. 

Packing of data, 4-6. 16. 31. 54, 69, 83. 

Page faults, 59. 

Paley. Raymond Edward Alan Christopher, 
54, 75. 

Papadimitriou, Christos Harilaos 

(E[a7i:a8r][jLrii:piou, Xpiaxo (； XaptXaou), 92. 

Papert, Seymour Aubrey, 66. 

Parabolas, 44, 66. 

Parallel processing of sub words, 19-23. 
59-61. 

Parenthesis traces, 54. 
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Parity function, 27, 62, 73. 79. 

suffix, 55 ， 69 ， 91. 

Parity patterns, 67-68. 

Parkin. Thomas Randall, 77. 

Patents, 79 ， 83. 

/ / 

Paterson. Michael Stewart, 22, 90. 91. 
Pattern recognition. 40. 

Patterns, searching for, 20-22. 61. 

Pentagrid, 36-39, 64. 

Perez, Aram, 51. 

Perfect hash functions, 78. 

Perfect parity patterns, 67-68. 

Perfect shuffles. 16. 50. 56, 57, 69, 80. 88. 
Period length, 62. 

Permutation matrices, 50. 

Permutation networks, 13-15. 57—58, 81. 
Permutations, 

induced by index digits, 56. 
of bits within a word, 13-17, 25, 50. 
of bytes within a word. 50. 
of the 2-adic integers, 53. 

Omega-routable, 56—57. 

Perpendicular lines, 36. 

Peterson, William Wesley, 51. 

Phi ((f)) ^ 64. 

Pi ( 71 ). as “random” example, 17. 

Pickover, Clifford Alan. 75. 

Pigeonhole principle, 104. 

Pipelined machine, 48-49. 

Pitteway, Michael Lloyd Victor. 45. 

Pixel algebra, v. 40. 

Pixel patterns. 4. 53. 

Pixels ， 39—48, 64-68. 
gray, 59, 67. 

Polya, Gyorgy (= George). 75. 

Polynomials modulo 2, 57. 
multiplication of. 70. 
remainders of, 57 ， 68. 

Polynomials modulo 5, 60. 

Population count, 11. see Sideways addition. 
Portability, 7. 

Possibility, in 3-valued logic, 63. 

Pratt, Vaughan Ronald, 54, 58 ， 81 ， 

84, 89, 103. 

Prefix problem, see Suffix parity function. 

Preorder of nodes. 33—35. 

/ 

Presume, Livingstone Irving, 55. 

Prime implicants ， 63. 

Prime numbers, 5. 54. 

Printing ， 39. 

Priority queues, 35. 

Pritchard, Paul Andrew. 77. 

Prodinger, Helmut, 78. 

Program counter, 26. 

Projection functions, 9. 

Prokop ， Harald, 10 ， 55. 

Quadratic forms, 45-47. 66-67. 

Quadtrees ， 85. 


Quantifications, 74 ， 89. 

Queen graph, 92. 

Quick, Jonathan Horatio. 53, 58. 

Quilt, 4. 

Rabin, Michael Oser (pm 84. 

Radix —2, 52. 

Radix conversion, 60. 

Radix exchange sort, 91. 

RAM (random-access machine). 26-27. 

62. 91. 

/ 

Ramshaw. Lyle Harold. 21. 

Randall, Keith Harold, 10 ， 55. 

Randomized data structures, 79. 

Range checking, 60. 

Range minimum query problem, 64. 

Rank of a binary matrix. 68. 

Rasters, 39 ， see Bitmaps. 

Rational 2-adic numbers. 61. 

7 

Ray, Louis Charles, 40. 

Raynaud-Richard, Pierre, 106. 

Reachability problem, 27-28, 33. 
Rearrangeable networks, see Permutation 
networks. 

Recurrence relations. 8, 10 ， 37, 51. 55, 67. 
Recursive processes, 15, 17. 32. 52, 72, 84. 
Redundant representations, 104. 

Reflection of bits, 12-13, 25, 55, 56, 96, 97. 
Regular languages, 61. 

Reitwiesner, George Walter, 55. 

Remainder mod 2 n —1, 11. 

Remainder mod 2 n ， 4. 

Removal of bits, 8. 

Replication of bits, 17, 58, 88. 

Represent at ion, 
of graphs, 27 ， 62. 
of permutations, 57. 
of sets as integers, 11, 18 ， 27—28, 

58, 62— 63, 75. 

of three states with two bits. 28-31. 63. 
Reversal of bits, 12— 13 ， 25 ， 55 ， 56 ， 96 ， 97. 
Rhoads, Glenn Charles, 84. 

Right-to-left minimum, 95. 

Rightmost bits, 8-10, 54. 

Rochdale, Simon, 1. 

Rokicki，Tomas Gerhard, v, 55 ， 79. 

Rook-neighbors, 40. 

Rookwise connected components, 

41-43, 65-66. 

Rosenfeld，Azriel 41. 

Rotation of square bitmaps, 67. 

Rounding, 33, 86. 

to an odd number, 2. 59. 86. 

Ruler function (px). 8, 20, 21. 25. 26, 28, 
32, 53, 55, 60, 64, 78, 100. 
summed ， 95. 

Runlength encoding, 100, see also Edges 
between pixels. 

Runs of ls ， 8, 11 ， 22— 23, 55, 61. 

Rutovitz, Denis, 40. 
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S, the letter, 48. 

SIS, 74-75. 

Saccheri, Giovanni Girolamo. 36. 

SADD (sideways addition), 9, 28, 76, 78. 79. 
Samet，Hanan (DDP pn), 85. 

Saturated addition, 92. 

Saturated subtraction, see Monus. 

Scattered arithmetic. 18 ， 58. 
addition, 18 ， 57. 
shifting, 58. 
subtraction ， 58. 

Scattering bits, 83. 

Schieber, Baruch Menachem 

/ 

oron im) ， 33. 

Schlafli, Ludwig, 97. 

Schroeppel, Richard Crabtree, 26. 52. 82. 
Seal, David, 10. 

Second-order logic, 74-75. 

Security holes, 69. 

Segmented broadcasting, see Stretching bits. 
Segmented sieves, 77. 

Sequential allocation. 91. 

SET, the game. 93. 

Sets，represented as integers, 11. 18 ， 

27— 28, 58, 62-63, 75. 
maximal proper subsets of, 58. 

Shades of gray, 67. 

Shallit, Jeffrey Outlaw, 78. 

Sheep-and-goats operation, 17—18 ， 57. 

Shi, Zhi-Jie Jerry ( 史志杰 ) ， 83. 

Shift instructions. 3. 19. 52. 61. 
signed, 49, 78. 

table lookup via. 5, 23, 69. 85. 88. 

Shift sets, 24-25. 

Shirakawa, Isao ( 白川功 ) ， 92. 

Shrinking of images, 42-43. 66. 

Shuffle network for routing, 56. 

Sibling links. 32. 63. 

Sibson, Robin, 92. 

Sideways addition, 2, 11—12, 55, 62. 79. 94. 
bytewise, 11 ， 88 . 
function vx^ 11, 27, 55. 78. 
summed, 55 ， 82. 

Sideways heaps, 32, 63-64. 

Sieve of Eratosthenes, 5, 54. 

Signed bits, representation of. 29. 55. 

Signed right shifts, 49 ， 78. 

SIMD (single instruction, multiple data) 
architecture. 19. 

j 

Simply connected components, 43. 

Singmaster. David Breyer. 105. 

Six-register algorithm, 103. 

Sjostrand, Jonas Erik, 106. 

Slanina ， Matteo. 74. 

Sleator, Daniel Dominic Kaplan. 4. 98. 
Slepian ， David, 13. 

Smallest element of a set, 11. 

Smearing bits to the right. 8. 11. 78. 

Sorted data ， 54. 

/ 


Sorting, 60, 75. 

networks for, 58. 

Soule, Stephen Parke, 87. 

Sprague, Roland Percival, 71. 

Squaring a polynomial, 57. 

Squines, 48, 66, 102. 

SR (shift right, preserving the sign). 

10, 49, 76, 78. 

SRU (shift right unsigned) ， 5 ， 78. 

Stanford GraphBase ， ii ， iii. 

Steele, Guy Lewis, Jr., 16, 57, 80. 83. 

Sterne ， Laurence, iii. 

Stockmeyer, Larry Joseph. 81. 84. 

Stockton. Fred G., 102. 

Stolfi, Jorge, v, 103. 

Storage allocation, 22, 54. 59. 

Strachey. Christopher, 12. 

Straight lines, digitizing, 66. 

Stretching bits, 58, 88. 

Strings, searching for special bytes in. 20. 
Strong broad word chains, 61. 

Subcubes. 18. 63. 

Subsets ， 11 ， 27—28, 62-63, 75. 

generating all, 18. 
maximal proper, 58. 

Subtraction, 3, 52, 59. 
byte wise, 59 ， 87. 
saturated, see Monus. 
scattered, 58. 

Suffix parity function, 55. 69. 91. 

Sum of bits, see Sideways addition. 

weighted, 55. 

Surrogates, 106. 

Surroundedness tree, 43. 66. 

Sutner, Klaus, 105. 

Swapping bits, 12-15. 55—56, 107. 

between variables, 71. 

SWAR methods, 19—23 ， 59-61. 

SWARC compiler, 85. 

Swift, Jonathan, 107. 

Sylow, Peter Ludvig Mejdell. 2-subgroup. 74. 
Symmetric functions, Boolean, 62. 

Symmetric group, 74. 

Symmetric order of nodes. 33. 

Table lookup by shifting. 5, 23, 69, 85, 88. 
Tarjan, Robert Endre. 33. 95. 

Ternary vectors, 31. 

Tessellation, 36, 47, 64. 

Tetrabyte or tetra: A 32-bit quantity, 69. 
Thinning an image, 40-41. 65. 

Thompson, Kenneth Lane, 68. 

Three-register algorithm, 45—48, 66. 
Three-state encodings, 28-31, 63. 

Three-valued logic, 31, 63. 

Tiling, 36, 47, 64. 

Time, mixed-radix representation of. 60. 
Tocher, Keith Douglas, 2. 59, 85. 

Tor uses, 65, 87. 

Trailing zeros, 8, see Ruler function. 
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Transdichotomous methods, see Broadword 

/ 

computations. 

Transitive closure. 27, 33. 

Transposing a 0-1 matrix, 15, 56. 67. 69, 80. 
Transposed allocation, 77. 

Traversal in post order, 95, 100-101. 
Traversal in preorder, 94, 100-101. 

Treaps, 79. 

Triangularizing a 0—1 matrix. 68. 

Tricks versus techniques. 2. 103. 

Trinomials, 57. 

Triply linked trees, 94 ， 95, 100. 

TrueType, 103. 

Truth tables, 9, 70. 

Tsukiyama, Shuji ( 築山修治 ) ， 92. 

Turing, Alan Mathison. 2. 
machines, 98. 

Two’s complement notation, 2, 26, 71. 
Typesetting, 39. 

UCS (Universal Character Set), 69. 

Ulam, Stanislaw Marcin, 93. 

numbers, 63. 

Ultraparallel lines, 36. 

Unary notation, 60. 

Unbiased rounding, 59, 86. 

Uncompressing bits, 57. 

Underflow mask, 90. 

Unger，Stephen Herbert, 19. 

Unicode, 69. 

Universal Character Set, 69. 

Unpacking of data ， 2, 4-6. 57, 83. 

Unsigned 2-adic integers. 71. 

Unsolvable problems, 75. 

Upper halfplane, 97. 

Uppercase letters, 59. 

Urban, Genevie Hawkins, 40. 

Uri ， Dario, 105. 

UTF-8: 8-bit UCS Transformation 
Format, 69. 

UTF-16: 16-bit UCS Transformation 
Format, 69. 

van Emde Boas, Peter, 32. 

Van Wyk, Christopher John. 99. 

Variance, 57. 


Veblen ， Oswald, 44. 

Vector space, basis for, 74 ， 106. 

Vertex covers, minimal, 63. 

Vishkin，Uzi Yehoshua V\yiJT> 33. 

Vitale, Fabio, 35. 

Vuillemin, Jean Etienne, 95. 

Wada，Eiiti ( 和田英一 ) ， 76. 

Warren, Henry Stanley, Jr., 8. 11. 12. 25. 

51 ， 52, 71 ， 78, 83, 86, 107. 

Wegner, Peter (= Weiden，Puttilo Leonovich 
=Befi^eH，nyTTHJio JleoHOBHH) ， 8 ， 12 . 
Weighted sum of bits, 55. 

Welter. Cornells P., 74, 75. 

Weste, Neil Harry Earle, 44. 

Wheeler, David John, 11. 

White pixels, 4, 40, 67. 

Wilkes, Maurice Vincent, 11. 

Willard. Dan Edward, 22. 60. 

Wilson, David Whitaker. 93. 

Wolfram, Stephen. 106. 

Wong. Chak-Kuen ( 黃澤權 ) ， 17 ， 58. 
Woodrum. Luther Jay, 77. 

Woods, Donald Roy, 85. 

Wraparound parity patterns, 67. 

Wunderlich, Charles Marvin. 93. 

Wyde: A 16-bit quantity. 69. 

XL25 game, 105. 

XOR (bitwise exclusive or), 2. 

identities involving, 3, 53 ， 55 ， 75. 

Yannakakis，Mihalis (riavvaxaxr)(；, 

92. 

Z order, see Zip. 

Zero-one principle, 54. 

Zero-or-set instructions, 9, 10 ， 88 . 

Zeta function, 78. 

Zijlstra, Erik ， 32. 

Zimmermann, Paul Vincent Marie. 83. 

Zip, the zipper function, 16 ， 50 ， 57, 

66, 77, 80, 83, 85. 

Zip-fastener method. 85. 

ZSNZ (zero or set if nonzero), 10, 88. 

ZSZ (zero or set if zero). 9. 
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MMIX OPERATION CODES 


# 0 


#1 


# 2 


FUN 


FUNE 



# 3 


FEQL 


#4 


FADD 4v 


# 5 


FL0TU[I] 4v 


SFL0T[I] 4v 


MULU 


ADDU 


4ADDU 


CMPU 


SLU 


FEQLE 


lOt; 




FDIV 40v 


DIV 



*6 


FSUB 



*7 


FIXU 


SFL0TU[I] 4v 


FREM 



FINT 4v 


SUB [I] t ； 



DIVU[I] 60 t ； 


SUBU[I] i ； 


16ADDU[I] ^ 


NEGU[I] i ； 



BZ[B] … 


BNZ[B] i；- 


PBZ [B] : 


PBNZ [B] 


CSZ 


CSNZ 


zsz 


ZSNZ 


LDBU 


LDTU 


LDHT 


PRELD 


STBU 


STTU 


STHT 


PREST 


ORN 


ANDN 


WDIF 


SADD[I] i ； 









BP[B] 


BNP[B] i ； 


PBP [B] : 


PBNP [B] : 


CSP[I] t ； 


CSNP[I] i ； 


ZSP[I] u 


ZSNP[I] i ； 


LDW 


LDO 


CSWAP[I]. 


PREG0[I] i ； 





BOD [B] 


BEV [B] 


PBOD[B] 


PBEV[B] 


CS0D[I] 


CSEV 


ZSOD[I] 


ZSEV 


LDWU[I] 


LDQU[I] 


STW 



STQ[I] 


STCO 


SYNCID[I] i ； 


NOR 


NAND[I] 


TDIF 


MOR 




SETML 


ORML 




SETL 


ORL 




INCH 


ANDNH 



INCML 



ANDNML 



INCL 


ANDNL 


PUSHJ[B] i ； 


[UN]SAVE • 


GETA[B] u 


PUT 



SYNC 



SWYM 


# D 



GET 



# E 


TRIP 5 


# F 


the branch is taken 


the branch is not taken 

















































































































































